Hi Folks, On Sat, Jun 7, 2014 at 2:08 AM, <[email protected]> wrote:
> > > > > It sounds a little like nanopubs, which can be manipulated using a > > library by Thomas Kuhn [1]. > > Thank you for link > Hi Lewis, Guys > > Just to understand this better. Does this mean that if some info was > extracted from > > http://example.org/path let say from head section of the page > > > A) > > the graph part become > > <http://example.org/path#head> > > but if from let say html5 "article" tag it will be > > <http://example.org/path#article> > > B) > Or it is more like > > <s> <p> <o> <http://example.org/path> . > <s> <hasContext> <http://example.org/path#context> < > http://example.org/path> > . > <http://example.org/path#context> <foundInside> "html/head" < > http://example.org/path> . > <http://example.org/path#context> <foundAtDate> "01-May-2014" < > http://example.org/path> . > <http://example.org/path#context> <foundBy> "...." < > http://example.org/path> > . > etc .. > > > I would like ask: > > 1) Where you thinking more like A or B approach ? > B > > 2) what tags will this feature support, maybe some subset like body,head > plus some of the new html5 ones: article, aside, header, footer etc. ? > or maybe you thought of giving the full xpath to the section like > "html/body/article/div[1]" > We have to define this. Maybe define an Any23 specification? This is highly relevant material that we can really leverage. > > 3) Did you guys thought about some practical use case already ? How this > information could be useful to someone ? > Yes... I am working on a Firefox extension for Any23 which will let you visualize the implicit markup of the page... but more importantly where on the page the markup came from e.g. context. > the main motivation for this is to make sure data is really relevant and is > put together HTML elements (e.g. like scraping) with metadata. > +1 > > how about a Json output that a configurably large "surrounding" html but > also the triples e.g. in standardized/normalized as much as possible json > LD ? > Would be reasonably trivial to implement... take for example the way we currently write triples to a stream but access to the stream via stream.toString(). The defining characteristic of the stream is that every triple is separated by a '\n' IIRC. So we could take advantage of this within toJSON() for example. > > i think this could be useful to better understand web pages but you're > right with you point 3) i personally dont have any specific need just now > so wouldnt feel like pushing for develoment this direction just yet > This was only suggestion as it seems like a excellent way for people to use Any23... plus it will provide a context aware aspect to Any23 which AFAIK no other parser/extractor implementation already does. Have a great weekend folks. Lewis
