Re: Context Aware Extraction

Giovanni Tummarello Fri, 06 Jun 2014 06:31:28 -0700

the main motivation for this is to make sure data is really relevant and is
put together HTML elements (e.g. like scraping) with metadata.


Sometime one has a metadata description (e.g. name) but not the phone
number which is just in html.

how about a Json output that a configurably large "surrounding" html but
also the triples e.g. in standardized/normalized as much as possible json
LD ?

i think this could be useful to better understand web pages but you're
right with you point 3) i personally dont have any specific need just now
so wouldnt feel like pushing for develoment this direction just yet

Gio


On Fri, Jun 6, 2014 at 11:51 AM, Szymon Danielczyk <
[email protected]> wrote:

> Hi Lewis, Guys
>
> Just to understand this better. Does this mean that if some info was
> extracted from
>
> http://example.org/path  let say from head section of the page
>
>
> A)
>
> the graph part become
>
> <http://example.org/path#head>
>
> but if from let say html5 "article" tag it will be
>
> <http://example.org/path#article>
>
> B)
> Or it is more like
>
> <s> <p> <o> <http://example.org/path> .
> <s> <hasContext>  <http://example.org/path#context> <
> http://example.org/path>
> .
> <http://example.org/path#context> <foundInside> "html/head" <
> http://example.org/path> .
> <http://example.org/path#context> <foundAtDate> "01-May-2014" <
> http://example.org/path> .
> <http://example.org/path#context> <foundBy> "...." <
> http://example.org/path>
> .
> etc ..
>
>
> I would like ask:
>
> 1) Where you thinking more like A or B approach ?
>
> 2) what tags will this feature support, maybe some subset like body,head
> plus some of the new html5 ones: article, aside, header, footer etc. ?
> or maybe you thought of giving the full xpath to the section like
> "html/body/article/div[1]"
>
> 3) Did you guys thought about some practical use case already ? How this
> information could be useful to someone ?
>
> Cheers
> Szymon
>
> On 6 June 2014 00:35, Lewis John Mcgibbney <[email protected]>
> wrote:
>
> > Hi Folks,
> > Giovanni and myself were recently discussing the concept of context-aware
> > triples extraction. An example of this would be the 'where' the triples
> > came from (within the WebPage) as well as the triple itself.
> > This of course bares close resemblance to N-Quads, however we substitute
> > the additional graph constituent with the 'context' one suggested above.
> > Does anyone have comments and/or suggestions on how we could implement a
> > context-aware extractor model/API on top of what we currently have?
> > Lewis
> >
> > --
> > *Lewis*
> >
>

Re: Context Aware Extraction

Reply via email to