the main motivation for this is to make sure data is really relevant and is put together HTML elements (e.g. like scraping) with metadata.
Sometime one has a metadata description (e.g. name) but not the phone number which is just in html. how about a Json output that a configurably large "surrounding" html but also the triples e.g. in standardized/normalized as much as possible json LD ? i think this could be useful to better understand web pages but you're right with you point 3) i personally dont have any specific need just now so wouldnt feel like pushing for develoment this direction just yet Gio On Fri, Jun 6, 2014 at 11:51 AM, Szymon Danielczyk < [email protected]> wrote: > Hi Lewis, Guys > > Just to understand this better. Does this mean that if some info was > extracted from > > http://example.org/path let say from head section of the page > > > A) > > the graph part become > > <http://example.org/path#head> > > but if from let say html5 "article" tag it will be > > <http://example.org/path#article> > > B) > Or it is more like > > <s> <p> <o> <http://example.org/path> . > <s> <hasContext> <http://example.org/path#context> < > http://example.org/path> > . > <http://example.org/path#context> <foundInside> "html/head" < > http://example.org/path> . > <http://example.org/path#context> <foundAtDate> "01-May-2014" < > http://example.org/path> . > <http://example.org/path#context> <foundBy> "...." < > http://example.org/path> > . > etc .. > > > I would like ask: > > 1) Where you thinking more like A or B approach ? > > 2) what tags will this feature support, maybe some subset like body,head > plus some of the new html5 ones: article, aside, header, footer etc. ? > or maybe you thought of giving the full xpath to the section like > "html/body/article/div[1]" > > 3) Did you guys thought about some practical use case already ? How this > information could be useful to someone ? > > Cheers > Szymon > > On 6 June 2014 00:35, Lewis John Mcgibbney <[email protected]> > wrote: > > > Hi Folks, > > Giovanni and myself were recently discussing the concept of context-aware > > triples extraction. An example of this would be the 'where' the triples > > came from (within the WebPage) as well as the triple itself. > > This of course bares close resemblance to N-Quads, however we substitute > > the additional graph constituent with the 'context' one suggested above. > > Does anyone have comments and/or suggestions on how we could implement a > > context-aware extractor model/API on top of what we currently have? > > Lewis > > > > -- > > *Lewis* > > >
