Fwd: [GSoC 2014] MARMOTTA-202 : OpenRefine import engine

Raffaele Palmieri Sat, 15 Mar 2014 05:16:33 -0700

Hi to all, any feedback for Open Refine's integration? I particularly ask
to Sergio who initiated Jira issue.
Cheers,
Raffaele.


---------- Messaggio inoltrato ----------
Da: *Raffaele Palmieri* <[email protected]>
Data: giovedì 13 marzo 2014
Oggetto: [GSoC 2014] MARMOTTA-202 : OpenRefine import engine
A: [email protected]


Hi Dileepa,


On 13 March 2014 20:28, Dileepa Jayakody
<[email protected]<javascript:_e(%7B%7D,'cvml','[email protected]');>
> wrote:

> Hi Raffaele,
>
> Thanks again for your suggestions.
> I think it will be a great addition to this project, to make the data
> imported from Openrefine interoperable with other datasets in Marmotta. I
> will followup with OpenRefine community to check whether they support DCAT
> vocab in there latest release. If it doesn't support DCAT do you think
> implementing DCAT support in OpenRefine a task within this project scope?
>
>
Basically yes, It could be a task within this project scope. I think that a
preliminary check is needed within RDF Refine.


> On Thu, Mar 13, 2014 at 4:21 PM, Raffaele Palmieri <
> [email protected]<javascript:_e(%7B%7D,'cvml','[email protected]');>>
> wrote:
>
> > Hi Dileepa,
> > some thoughts that I also share with other Marmotta's team members
> > regarding integration with Open Refine.
> > For the second level of integration, that fundamentally exports towards
> > Marmotta both CSV and other data to produce RDF, it would be interesting
> > try to add the functionality in Open Refine to supply additional data to
> > dataset, using for example DCAT Vocabulary [1].
> > I don't remember if this feature is covered by GRefine RDF Extension, of
> > that it's present a new release(ALPHA 0.9.0) [2]
> > If dataset is supplied with DCAT metadata, Marmotta could expose it to
> > facilitate its interoperability with other datasets.
> > To do that, Marmotta needs to store also structured datasets, not
> > necessarily instantiated in RDF triples.
> >
>
> I think Marmotta's Kiwi Tripple Store can be connected to RDBMS back ends
> (MySQL, Postgres, H2), therefore above requirement of storing structured
> data in Marmotta's backend is fulfilled. Please correct me if I'm wrong.
>
>
  No, Kiwi Triple Store doesn't manage simple structured files(e.g. CSV),
but rightly only instances of triples.
  The storage I mean is quite simple, also file system could be used,
rightly to retrieve it from Marmotta at a later time using for example
dcat:downloadUrl. Clearly this dataset is a copy of that tooled with
Refine, that could be overwritten anytime.



> In summary I think we are looking at 2 main tasks now.
> 1. Ability to import data from OpenRefine process
>

 Yes, in addition to linked dataset(4&5 stars) also structured dataset with
simpler formats(CSV, etc.) furnished for example of DCAT metadata.


> 2. Ability to configure the imported OpenRefine data inter-operable with
> other datasets in Marmotta (potentially using DCAT vocab)
>

Yes, with the possibility to retrieve them from Marmotta, so also 3 stars
datasets.


>
> More ideas, suggestions are mostly welcome.
>
>
Before that you prepare the proposal, we should seek advice to the
Marmotta's team.


>
> Thanks,
> Dileepa
>
>
Regards,
Raffaele.




> What do you think about?
> Regards,
> Raffaele.
>
>
> [1] http://www.w3.org/TR/vocab-dcat/
> [2] https://github.com/fadmaa/grefine-rdf-extension/releases/tag/v0.9.0
>
>
> On 11 March 2014 10:29, Dileepa Jayakody <[email protected]>
> wrote:
>
> > Thank you very much Raffaele for the detailed explaination.
> >
> > I will do some more background research on Marmotta data import and
> > OpenRefine and come up with questions and ideas I get.
> >
> > Also any new suggestions, directions to evolve this project idea are
> > welcome.
> >
> > Thanks,
> > Dileepa
> >
> >
> > On Tue, Mar 11, 2014 at 3:14 AM, Raffaele Palmieri <
> > [email protected]> wrote:
> >
> > > Hi Dileepa,
> > > pleased to meet you and know your interest for contributing to
> Marmotta.
> > > As discussed in Marmotta's mailing list, this integration could be
> > reached
> > > at various levels.
> > > A first level is reached refining your messy data with Refine tools,
> > using
> > > RDF extension, that already offers a graphical UI to model RDF data
> > > producing an RDF skeleton and then import new data in Marmotta,
> compliant
> > > to the created skeleton .
> > > This integration mode has been implemented in the past using [1] but
> > needs
> > > to be updated because:
> > > 1)Google Refine became Open Refine
> > > 2)LMF became Marmotta in its linked-data core functionalities
> > > This update also requires work about project configuration, because
> Open
> > > Refine has a different configuration than Apache Marmotta.
> > > Whatever kind of integration could be achieved, I think that work
about
> > > project configuration is required.
> > > A second level of integration is reached if you break up RDF in CSV
and
> > set
> > > of RDF mappings(aka RDF skeleton).
> > > So, starting from exported project that contains CSV and related
> actions
> > to
> > > produce RDF Skeleton, the integration expects to produce final RDF in
> > > Marmotta's world, probably performing similar steps as GRefine RDF
> > > Extension.
> > > For that second level of integration, export functionality and RDF
> > skeleton
> > > should be explored to verify what is easily exportable.
> > > At the moment, these are the hypothesis of integration, clearly the
> > second
> > > appears to be more complex, but also the first brings non-trivial
work.
> > > Since you have experience on other projects related to Semantic Web,
as
> > > Apache Stanbol, feel free to propose other hypothesis of integration,
> > > regards,
> > > Raffaele.
> > >
> > > [1]https://code.google.com/p/lmf/wiki/GoogleRefineExtension
> > >
> > >
> > >
> > >
> > > On 10 March 2014 21:35, Dileepa Jayakody <[email protected]>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I'm Dileepa a research student from University of Moratuwa, Sri
Lanka
> > > with
> > > > keen interest in the linked-data and semantic-web domains. I have
> > worked
> > > > with linked-data related projects such as Apache Stanbol and I'm
> > > > experienced with related technologies like RDF, SPARQL, FOAF etc.
I'm
> > > very
> > > > much interested in applying for GSoC this year with Apache Marmotta.
> > > >
> > > > I would like to open up a discussion on the OpenRefine integration
> > > project
> > > > idea [1]. AFAIU, the goal of this project is to impo

Fwd: [GSoC 2014] MARMOTTA-202 : OpenRefine import engine

Reply via email to