Hi Folks,

On Sat, Jun 7, 2014 at 2:08 AM, <[email protected]> wrote:

>
> >
> > It sounds a little like nanopubs, which can be manipulated using a
> > library by Thomas Kuhn [1].
>
> Thank you for link


> Hi Lewis, Guys
>
> Just to understand this better. Does this mean that if some info was
> extracted from
>
> http://example.org/path  let say from head section of the page
>
>
> A)
>
> the graph part become
>
> <http://example.org/path#head>
>
> but if from let say html5 "article" tag it will be
>
> <http://example.org/path#article>
>
> B)
> Or it is more like
>
> <s> <p> <o> <http://example.org/path> .
> <s> <hasContext>  <http://example.org/path#context> <
> http://example.org/path>
> .
> <http://example.org/path#context> <foundInside> "html/head" <
> http://example.org/path> .
> <http://example.org/path#context> <foundAtDate> "01-May-2014" <
> http://example.org/path> .
> <http://example.org/path#context> <foundBy> "...." <
> http://example.org/path>
> .
> etc ..
>
>
> I would like ask:
>
> 1) Where you thinking more like A or B approach ?
>

B


>
> 2) what tags will this feature support, maybe some subset like body,head
> plus some of the new html5 ones: article, aside, header, footer etc. ?
> or maybe you thought of giving the full xpath to the section like
> "html/body/article/div[1]"
>

We have to define this.
Maybe define an Any23 specification?
This is highly relevant material that we can really leverage.


>
> 3) Did you guys thought about some practical use case already ? How this
> information could be useful to someone ?
>

Yes...
I am working on a Firefox extension for Any23 which will let you visualize
the implicit markup of the page... but more importantly where on the page
the markup came from e.g. context.



> the main motivation for this is to make sure data is really relevant and is
> put together HTML elements (e.g. like scraping) with metadata.
>


 +1

>
> how about a Json output that a configurably large "surrounding" html but
> also the triples e.g. in standardized/normalized as much as possible json
> LD ?
>

Would be reasonably trivial to implement... take for example the way we
currently write triples to a stream but access to the stream via
stream.toString(). The defining characteristic of the stream is that every
triple is separated by a '\n' IIRC. So we could take advantage of this
within toJSON() for example.


>
> i think this could be useful to better understand web pages but you're
> right with you point 3) i personally dont have any specific need just now
> so wouldnt feel like pushing for develoment this direction just yet
>

This was only suggestion as it seems like a excellent way for people to use
Any23... plus it will provide a context aware aspect to Any23 which AFAIK
no other parser/extractor implementation already does.

Have a great weekend folks.
Lewis

Reply via email to