Re: PDF Description Extraction For Linked data

Rafa Haro Fri, 30 May 2014 16:11:31 -0700

Hi Maatari

On Thursday, May 29, 2014, Maatari Daniel Okouya <[email protected]> wrote:


> Many Rafa,
>
> one last question. In some case i will already have the metadata available
> in other format that will need to be translated in RDF.
>
> Let’s assume i have it done. What i will get is basically an set of
> instance resource describe with the vocabularies of my choices.
>
> That’s where the data lifting process you are talking about comes in play.
> To be linked to the LOD, i would need to link my description to other
> dataset available on the LOD.
>
> Is there a way/pipeline that start from an RDF description and links it to
> the LOD, that is available with Sanbol. To be honest i already saw spotted
> things like, datalift and silk, but i was just wondering if something like
> was available with sanbol.
>
> If I have understood correctly, I would say that you can use an extension
of google refine which uses stanbol to reconciliate your current data with
LOD datasets imported in stanbol like for example DBPedia. You can find
more information here:

https://code.google.com/p/lmf/wiki/GoogleRefineUsersDocumentation

Hope that helps.
Cheers,

Rafa


>
> Many thanks,
>
> -M-
> --
> Maatari Daniel Okouya
> Sent with Airmail
>
> On 29 May 2014 at 09:24:54, Rafa Haro ([email protected]) wrote:
>
> Hi Maatari,
>
> El 29/05/14 02:27, Maatari Daniel Okouya escribió:
>
>  Rafa,
>
>  Many thanks for your elaborated answer.
>
>  It seems to me that from your elaborated answer i did not completely
> graps the concepts behind StanBol. Its primary purpose is semantically
> annotating the content of a file for the purpose of semantic search.
> Although one could divert by reusing the enhancing infrastructure to get
> the description generated and apply some Sparql rule to get the description
> in a format desire. It is not geared toward linked data out of the box.
> What i mean generating a description that you could publish as is, which is
> what i was looking for. As you say, the best match here is the description
> returned by the Topic annotation engine and maybe few things extracted by
> Tika.
>
> Well, the primary purpose or use case wouldn't have to be necessarily
> Semantic Search. I would say that Stanbol helps in the task of extracting
> semantic metadata from content (semantic lifting). It is true that the most
> common way of metadata extraction is the Entity Linking and there is a
> reason for that: stanbol was born as a tool for Content Management Systems
> where companies are supposed to manage domain vocabularies that could be
> used to enrich the enterprise content. Anyway, the enhancer has been
> modularized around extracting engines, so you can perfectly implement an
> engine for your use case and take advantage of the Stanbol APIs to express
> your extracted metadata as RDF.
>
>
>  I mean i still need to read a bit, but this is what i get for now, from
> your explanation and my readings.
>
>  Am I close ?
>
> I think so :-). Cheers
>
> Rafa
>
>
>  Best,
>  -M-
>  --
> Maatari Daniel Okouya
> Sent with Airmail
>
> On 28 May 2014 at 13:46:00, Rafa Haro ([email protected]) wrote:
>
>  Hi Maatari,
>
> El 27/05/14 21:05, Maatari Daniel Okouya escribió:
> > Hi ,
> >
> > Completing my previous question, I think it would be better for me to
> give the bigger picture of what i’m trying to achieve.
> >
> >
> > I have been charge with helping in disseminating the publications
> content of my organisation. Most of them are in PDF.
> >
> > Therefore, I need a process to produce a meaningful RDF description of
> our content that links as much as possible to the LOD cloud and LOV (liked
> open vocab). Hence i need to use common core vocabularies as much as i can
> i.e. dublin, schema.org, Bibo, FOAF, etc… and reference entity from
> DBpedia for instance.
> >
> > Searching around the web how to automatically generate these
> descriptions which would include creator, publisher, primaryTopic, subject,
> thematic etc…. It seems to me that Apache StanBol was the best match.
> With Stanbol you can enrich your content with your own vocabularies or
> dataset from the LOD cloud as long as you import them before as a site.
> Let's say that "out of the box" enrichment process consist on linking
> pieces of texts (like entities/concepts' names/labels) with entities
> within your datasets.
> >
>
>

Re: PDF Description Extraction For Linked data

Reply via email to