Re: Loading JSON-LD into ES

[email protected] Sat, 27 Sep 2014 23:44:37 -0700

In your case, you have to create ES bulk format for efficient indexing of
such files, or better, instrumenting the ES python client to push JSON-LD
into ES.


Jörg

On Sun, Sep 28, 2014 at 4:23 AM, abo <[email protected]> wrote:

> Thank you all for your responses and interesting conversation about RDF
> serialization into ES. With regards to my original post, I ended up using
> a solution based on RDFlib:
>
> https://github.com/RDFLib/rdflib-jsonld
>
> It works as expected, and compacting the content by using @context does
> the trick and is flexible. It is an in-memory process however, which could
> be an issue for those with very large RDF files. When using Jena, I didn't
> find the ability to add @context mappings, but maybe I didn't dig enough.
>
> On a side note, looks like the rdflib-jsonld solution already has support
> for XSD literals and lists, so perhaps it could be extended to map directly
> into ES _type if that is a good direction.
>
> With my Json-ld file ready for ingestion into ES, I do have another
> question: are there utilities to bulk load such documents (the json-ld
> contains individual documents per ES, each with an _id), or do I just write
> a script that calls curl -XPUT for each record in the json-ld file? Seems
> like a pretty common use case.
>
> Thanks again to all, interesting stuff. Happy to contribute to extending
> an existing solution.
>
> -- ab
>
> On Saturday, September 27, 2014 9:24:24 AM UTC-7, Jörg Prante wrote:
>>
>> For the _mapping, I think about two more types for that I intend to write
>> ES type mappers, "iri" and "literal", so ES can receive XSD data types and
>> language codes and map them to fields / analyzers. IRIs are just opaque
>> strings but they can be shortened if prefix is configured and can be used
>> as _id or for referencing to an _id.
>>
>> Instead of _mapping I prefer the thought about handling @contexts like
>> template documents.
>>
>> Not sure about the best way to manage JSON-LD. There are two approaches:
>> save a JSON-LD (you say original document) beside other versions. This
>> requires more space and I'm not sure about the purpose of the original
>> JSON-LD. The other approach is more about dropping original JSON-LD after
>> parsing it to triples and store the triples in an ES JSON doc which is a
>> surrogate close to JSON-LD but arranges with all the JSON dialect
>> characteristics of the ES document DSL.
>>
>> I'm not in scala, so I can not promise much, but happy about glimpsing
>> all related code!
>>
>> Jörg
>>
>>
>> On Sat, Sep 27, 2014 at 4:50 PM, Alfredo Serafini <[email protected]>
>> wrote:
>>
>>> HI Jorg Indeed! :-)
>>>
>>> What I like about _mapping is that they are managed as documents too,
>>> and they can be:
>>>
>>>    1. automatically inferred from data (at risk, but useful)
>>>    2. provided by static files, in some cases
>>>    3. managed for _index/_types
>>>
>>> all those things could be done with something like a _context (which
>>> will include at first a single @context). The first point should probably
>>> be avoided at all for json-ld :-), but it should be possible.
>>>
>>> But we may need more @context items for a single "resource" schema
>>> (referring to _index/_type), and in perspective it's even possible to
>>> re-use a @context for different _index/_type pairs.
>>> Furthermore: when exposing results in jsonld one might want to reference
>>> an external @context and merge it before providing results, and In my
>>> opinion the more "risky" part is when input the original json-ld, if we
>>> want to flat it and extract the @context which will permits us to
>>> recostruct later the original document.
>>> Given the fact that it could be possible to map every kind of json
>>> results from ES, documents imported as jsonld might has to maintain at
>>> least the original fields.
>>>
>>> I'd like to put some code on github and if you want we could join the
>>> effort on that? I'm working mostly on scala at the moment. What do you
>>> think about?
>>>
>>>
>>>
>>>
>>> Il giorno venerdì 26 settembre 2014 20:32:52 UTC+2, Jörg Prante ha
>>> scritto:
>>>>
>>>> Absolutely. My thought is about managing one (or more) context ES JSON
>>>> document(s) where all the @context definitions of an index live. A format
>>>> plugin can then process search results and converts ES JSON to expanded
>>>> JSON-LD and from there to other RDF serializations.
>>>>
>>>> Jörg
>>>>
>>>> On Fri, Sep 26, 2014 at 6:23 PM, Alfredo Serafini <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> using json-ld is indeed rather simple, as it is JSON, and then it's
>>>>> even possible to index it as is.
>>>>> I'm currently using ES for storing RDF documents in json-ld on a
>>>>> specific index: in that case one can simply use the uri as an _id, recover
>>>>> the full original format by _source, and use basic search capabilities on
>>>>> the index, if escaping / nesting it's not a big deal.
>>>>>
>>>>> However, in order to use resource with some more flexibility, I think
>>>>> the best would be index them as "flat" as possible, then use an ad-hoc
>>>>> @context on the ES json to obtain again the original json-ld.
>>>>> This would be my ideal usage at the moment: seems complex at first,
>>>>> but it's not, I'm currently experimenting in saving @context for a _type,
>>>>> obtaining let's say a sort of _context, similar to a _mapping, to
>>>>> reconstruct the original semantics.
>>>>> If someone likes the idea, I'd like to share thoughts on that :-)
>>>>>
>>>>>
>>>>> Il giorno venerdì 26 settembre 2014 14:08:07 UTC+2, Jörg Prante ha
>>>>> scritto:
>>>>>>
>>>>>> Lukáš,
>>>>>>
>>>>>> of course you are right, RDF/XML looks complex and requires parsing.
>>>>>> The underlying principle of all RDF is a graph (or a series of triples in
>>>>>> form of subject/predicate/object, where the triple series is a
>>>>>> serialization of the graph), So the challenge is first the parsing of RDF
>>>>>> input, and second, constructing the model, and third, serializing the 
>>>>>> model
>>>>>> to an ES-friendly input (here: JSON-LD, sort of). RDF ensures that there 
>>>>>> is
>>>>>> a single model for all serializations.
>>>>>>
>>>>>> This technical perspective does not necessarily solve all challenges
>>>>>> that are inherent to the chosen data model. For example, nested resources
>>>>>> in RDF. It might be feasible to flatten nested resource by their
>>>>>> identifiers and generate one JSON after the other. Or it could be 
>>>>>> feasible
>>>>>> to keep nested resources intact and wrap them into nested structures in a
>>>>>> single ES JSON object.
>>>>>>
>>>>>> In my data model, I can map RDF subject IDs to ES doc IDs. Other data
>>>>>> models may prefer other approaches to select ES doc IDs.
>>>>>>
>>>>>> Jörg
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 26, 2014 at 10:11 AM, Lukáš Vlček <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Jörg,
>>>>>>>
>>>>>>> my concern is that RDF/XML allow to express one thing in several
>>>>>>> ways. For example, if you take FOAF specification then there are several
>>>>>>> ways how you can express that one Person knows other Person. One way it
>>>>>>> using reference IDs other way it using nested Person inside other 
>>>>>>> Person.
>>>>>>> See [1] for examples. My understanding is that although both ways 
>>>>>>> express
>>>>>>> exactly the same information they lead to different XML representation 
>>>>>>> and
>>>>>>> thus to different JSON-LD. Not that you can push such data in ES but I
>>>>>>> wonder if you can then have any consistent way of querying such data.
>>>>>>>
>>>>>>> May be there is some way how you can preprocess XML document and
>>>>>>> convert all nested Persons to references (would require arbitrary ID
>>>>>>> construction?). Or something similar. Though I am not sure this would be
>>>>>>> generally applicable approach to any RDF data.
>>>>>>>
>>>>>>> [1] http://www.xml.com/pub/a/2004/02/04/foaf.html
>>>>>>>
>>>>>>> Regards,
>>>>>>> Lukas
>>>>>>>
>>>>>>> On Fri, Sep 26, 2014 at 9:28 AM, [email protected] <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> JSON-LD is perfect for ES indexing, as long as you use the
>>>>>>>> "compact" form of representation.
>>>>>>>>
>>>>>>>> http://www.w3.org/TR/json-ld-api/#compaction-algorithms
>>>>>>>>
>>>>>>>> Example:
>>>>>>>>
>>>>>>>> https://github.com/lanthaler/JsonLD/blob/master/Test/Fixture
>>>>>>>> s/sample-compacted.jsonld
>>>>>>>>
>>>>>>>> This means you should use short field names and shorten IRIs to a
>>>>>>>> prefix form. This gives a convenient mapping to ES field names (e.g.
>>>>>>>> "dc:title" or "dc:creator"). The '@' fields can also be indexed and 
>>>>>>>> they do
>>>>>>>> not control anything special in ES (some @id may be mapped to ES _id 
>>>>>>>> but
>>>>>>>> for nested structures this does not match)
>>>>>>>>
>>>>>>>> I use my own RDF API and transform RDF graphs (so not only JSON-LD
>>>>>>>> but also other formats like N-Triples and RDF/XML) into XContent using 
>>>>>>>> this
>>>>>>>> method:
>>>>>>>>
>>>>>>>> https://github.com/xbib/xbib/blob/master/content/src/main/ja
>>>>>>>> va/org/xbib/rdf/content/DefaultResourceContentBuilder.java
>>>>>>>>
>>>>>>>> I plan to extend this content building by interpreting rdf:type and
>>>>>>>> rdf:list etc. to generate correct ES JSON objects and arrays. There is 
>>>>>>>> also
>>>>>>>> an amount of work left to do for the plethora of XSD types in RDF 
>>>>>>>> literals
>>>>>>>> or for language tags.
>>>>>>>>
>>>>>>>> This will be subsumed into an RDF input/output plugin for an
>>>>>>>> ES-based Linked Data Platform
>>>>>>>>
>>>>>>>> http://www.w3.org/TR/ldp/
>>>>>>>>
>>>>>>>> but there is no ETA yet.
>>>>>>>>
>>>>>>>> Jörg
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I think you will have to preprocess documents on your side first
>>>>>>>>> and then push into ES individually (you can push in batch).
>>>>>>>>>
>>>>>>>>> As a side note, I would say json-ld is quite low level
>>>>>>>>> serialization od RDF data IMO not optimal for ES indexing. May be 
>>>>>>>>> better
>>>>>>>>> would be to find some RDF-OOM tool and have your RDF documents mapped 
>>>>>>>>> to
>>>>>>>>> Java POJOs and serialize POJOs into JSONs instead (you can use Jackson
>>>>>>>>> library for that for example). This will give you better control over 
>>>>>>>>> whole
>>>>>>>>> RDF -> JSON conversion process.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Lukas
>>>>>>>>>
>>>>>>>>> On Thu, Sep 25, 2014 at 7:21 PM, abo <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I'm new to Elasticsearch, so forgive me if this is a basic
>>>>>>>>>> question or if it's in some documentation that I haven't read...
>>>>>>>>>>
>>>>>>>>>> I am trying to load a json-ld file into ES. The json-ld file was
>>>>>>>>>> generated from an RDF file, using Jena. The structure starts with:
>>>>>>>>>>
>>>>>>>>>> {
>>>>>>>>>>   "@graph" :
>>>>>>>>>>
>>>>>>>>>> followed by the individual "documents", each with:
>>>>>>>>>>
>>>>>>>>>> {
>>>>>>>>>>     "@id" :
>>>>>>>>>>
>>>>>>>>>> and a variable number of parameters in each.
>>>>>>>>>>
>>>>>>>>>> My question is how do I load this into ES and ensure that
>>>>>>>>>> documents are individually referenced (as opposed to the entire 
>>>>>>>>>> json-ld
>>>>>>>>>> file)?
>>>>>>>>>>
>>>>>>>>>> Do I need to doctor this json-ld file further in order to load it?
>>>>>>>>>>
>>>>>>>>>> Thanks for your help.
>>>>>>>>>>
>>>>>>>>>> -- abo
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>> Google Groups "elasticsearch" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>>> send an email to [email protected].
>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb
>>>>>>>>>> 1-4c50-96c4-8f586e1e0807%40googlegroups.com
>>>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>> .
>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "elasticsearch" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to [email protected].
>>>>>>>>> To view this discussion on the web visit
>>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%
>>>>>>>>> 3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com
>>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "elasticsearch" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZm
>>>>>>>> Tcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com
>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "elasticsearch" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNt
>>>>>>> TAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com
>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40goo
>>>>> glegroups.com
>>>>> <https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1a4c20a6-a215-42b8-bf11-350b766b508a%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/1a4c20a6-a215-42b8-bf11-350b766b508a%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEJqXDGX2xNqCi34CQu-q_V2OZHkNx2t5FEBuSLQmXzdw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Loading JSON-LD into ES

Reply via email to