Re: [GSoC 2014] MARMOTTA-202 : OpenRefine import engine

Raffaele Palmieri Thu, 13 Mar 2014 14:35:21 -0700

Hi Dileepa,


On 13 March 2014 20:28, Dileepa Jayakody <[email protected]> wrote:

> Hi Raffaele,
>
> Thanks again for your suggestions.
> I think it will be a great addition to this project, to make the data
> imported from Openrefine interoperable with other datasets in Marmotta. I
> will followup with OpenRefine community to check whether they support DCAT
> vocab in there latest release. If it doesn't support DCAT do you think
> implementing DCAT support in OpenRefine a task within this project scope?
>
>
Basically yes, It could be a task within this project scope. I think that a
preliminary check is needed within RDF Refine.


> On Thu, Mar 13, 2014 at 4:21 PM, Raffaele Palmieri <
> [email protected]> wrote:
>
> > Hi Dileepa,
> > some thoughts that I also share with other Marmotta's team members
> > regarding integration with Open Refine.
> > For the second level of integration, that fundamentally exports towards
> > Marmotta both CSV and other data to produce RDF, it would be interesting
> > try to add the functionality in Open Refine to supply additional data to
> > dataset, using for example DCAT Vocabulary [1].
> > I don't remember if this feature is covered by GRefine RDF Extension, of
> > that it's present a new release(ALPHA 0.9.0) [2]
> > If dataset is supplied with DCAT metadata, Marmotta could expose it to
> > facilitate its interoperability with other datasets.
> > To do that, Marmotta needs to store also structured datasets, not
> > necessarily instantiated in RDF triples.
> >
>
> I think Marmotta's Kiwi Tripple Store can be connected to RDBMS back ends
> (MySQL, Postgres, H2), therefore above requirement of storing structured
> data in Marmotta's backend is fulfilled. Please correct me if I'm wrong.
>
>
  No, Kiwi Triple Store doesn't manage simple structured files(e.g. CSV),
but rightly only instances of triples.
  The storage I mean is quite simple, also file system could be used,
rightly to retrieve it from Marmotta at a later time using for example
dcat:downloadUrl. Clearly this dataset is a copy of that tooled with
Refine, that could be overwritten anytime.



> In summary I think we are looking at 2 main tasks now.
> 1. Ability to import data from OpenRefine process
>

 Yes, in addition to linked dataset(4&5 stars) also structured dataset with
simpler formats(CSV, etc.) furnished for example of DCAT metadata.


> 2. Ability to configure the imported OpenRefine data inter-operable with
> other datasets in Marmotta (potentially using DCAT vocab)
>

Yes, with the possibility to retrieve them from Marmotta, so also 3 stars
datasets.


>
> More ideas, suggestions are mostly welcome.
>
>
Before that you prepare the proposal, we should seek advice to the
Marmotta's team.


>
> Thanks,
> Dileepa
>
>
Regards,
Raffaele.



>
> > What do you think about?
> > Regards,
> > Raffaele.
> >
> >
> > [1] http://www.w3.org/TR/vocab-dcat/
> > [2] https://github.com/fadmaa/grefine-rdf-extension/releases/tag/v0.9.0
> >
> >
> > On 11 March 2014 10:29, Dileepa Jayakody <[email protected]>
> > wrote:
> >
> > > Thank you very much Raffaele for the detailed explaination.
> > >
> > > I will do some more background research on Marmotta data import and
> > > OpenRefine and come up with questions and ideas I get.
> > >
> > > Also any new suggestions, directions to evolve this project idea are
> > > welcome.
> > >
> > > Thanks,
> > > Dileepa
> > >
> > >
> > > On Tue, Mar 11, 2014 at 3:14 AM, Raffaele Palmieri <
> > > [email protected]> wrote:
> > >
> > > > Hi Dileepa,
> > > > pleased to meet you and know your interest for contributing to
> > Marmotta.
> > > > As discussed in Marmotta's mailing list, this integration could be
> > > reached
> > > > at various levels.
> > > > A first level is reached refining your messy data with Refine tools,
> > > using
> > > > RDF extension, that already offers a graphical UI to model RDF data
> > > > producing an RDF skeleton and then import new data in Marmotta,
> > compliant
> > > > to the created skeleton .
> > > > This integration mode has been implemented in the past using [1] but
> > > needs
> > > > to be updated because:
> > > > 1)Google Refine became Open Refine
> > > > 2)LMF became Marmotta in its linked-data core functionalities
> > > > This update also requires work about project configuration, because
> > Open
> > > > Refine has a different configuration than Apache Marmotta.
> > > > Whatever kind of integration could be achieved, I think that work
> about
> > > > project configuration is required.
> > > > A second level of integration is reached if you break up RDF in CSV
> and
> > > set
> > > > of RDF mappings(aka RDF skeleton).
> > > > So, starting from exported project that contains CSV and related
> > actions
> > > to
> > > > produce RDF Skeleton, the integration expects to produce final RDF in
> > > > Marmotta's world, probably performing similar steps as GRefine RDF
> > > > Extension.
> > > > For that second level of integration, export functionality and RDF
> > > skeleton
> > > > should be explored to verify what is easily exportable.
> > > > At the moment, these are the hypothesis of integration, clearly the
> > > second
> > > > appears to be more complex, but also the first brings non-trivial
> work.
> > > > Since you have experience on other projects related to Semantic Web,
> as
> > > > Apache Stanbol, feel free to propose other hypothesis of integration,
> > > > regards,
> > > > Raffaele.
> > > >
> > > > [1]https://code.google.com/p/lmf/wiki/GoogleRefineExtension
> > > >
> > > >
> > > >
> > > >
> > > > On 10 March 2014 21:35, Dileepa Jayakody <[email protected]>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I'm Dileepa a research student from University of Moratuwa, Sri
> Lanka
> > > > with
> > > > > keen interest in the linked-data and semantic-web domains. I have
> > > worked
> > > > > with linked-data related projects such as Apache Stanbol and I'm
> > > > > experienced with related technologies like RDF, SPARQL, FOAF etc.
> I'm
> > > > very
> > > > > much interested in applying for GSoC this year with Apache
> Marmotta.
> > > > >
> > > > > I would like to open up a discussion on the OpenRefine integration
> > > > project
> > > > > idea [1]. AFAIU, the goal of this project is to import data to
> > Marmotta
> > > > > triple store (to Kiwi triple-store by default) from OpenRefine
> after
> > > the
> > > > > data has been refined and exported.
> > > > >
> > > > > I did some background reading on Marmotta data import process [2]
> > which
> > > > > explains different ways to import RDF data to back-end triple
> store.
> > > > > Currently OpenRefine exports data in several formats: csv, tsv,
> xsl,
> > > html
> > > > > tables. So I think the main task of this project will be to convert
> > > this
> > > > > exported data into RDF format and make it compatible to Marmotta
> data
> > > > > import process. I did a quick research on how to do so and there
> are
> > > > > several options to convert such data to RDF.
> > > > >
> > > > > They are,
> > > > > 1. RDF extension to OpenRefine :
> > > > https://github.com/sparkica/rdf-extension
> > > > > 2. RDF refine : http://refine.deri.ie/
> > > > > 3. D2R server http://d2rq.org/d2r-server (if OpenRefine data is
> > > imported
> > > > > from a SQL database)
> > > > >
> > > > > Apart from the data conversion process from OpenRefine to RDF, what
> > are
> > > > the
> > > > > other tasks to be done in this project?
> > > > > Appreciate your thoughts and suggestions.
> > > > >
> > > > > Thanks,
> > > > > Dileepa
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/MARMOTTA-202
> > > > > [2] http://wiki.apache.org/marmotta/ImportData
> > > > > [3]
> > > > >
> > > >
> > >
> >
> https://github.com/OpenRefine/OpenRefine/wiki/Exporters#exporting-projects
> > > > >
> > > >
> > >
> >
>

Re: [GSoC 2014] MARMOTTA-202 : OpenRefine import engine

Reply via email to