Re: Loading JSON-LD into ES

Alfredo Serafini Sat, 27 Sep 2014 07:51:11 -0700

HI Jorg Indeed! :-)

What I like about _mapping is that they are managed as documents too, and 
they can be:


   1. automatically inferred from data (at risk, but useful)
   2. provided by static files, in some cases
   3. managed for _index/_types

all those things could be done with something like a _context (which will 
include at first a single @context). The first point should probably be 
avoided at all for json-ld :-), but it should be possible.

But we may need more @context items for a single "resource" schema 
(referring to _index/_type), and in perspective it's even possible to 
re-use a @context for different _index/_type pairs.
Furthermore: when exposing results in jsonld one might want to reference an 
external @context and merge it before providing results, and In my opinion 
the more "risky" part is when input the original json-ld, if we want to 
flat it and extract the @context which will permits us to recostruct later 
the original document.
Given the fact that it could be possible to map every kind of json results 
from ES, documents imported as jsonld might has to maintain at least the 
original fields.

I'd like to put some code on github and if you want we could join the 
effort on that? I'm working mostly on scala at the moment. What do you 
think about?




Il giorno venerdì 26 settembre 2014 20:32:52 UTC+2, Jörg Prante ha scritto:
>
> Absolutely. My thought is about managing one (or more) context ES JSON 
> document(s) where all the @context definitions of an index live. A format 
> plugin can then process search results and converts ES JSON to expanded 
> JSON-LD and from there to other RDF serializations.
>
> Jörg
>
> On Fri, Sep 26, 2014 at 6:23 PM, Alfredo Serafini <[email protected] 
> <javascript:>> wrote:
>
>> Hi 
>>
>> using json-ld is indeed rather simple, as it is JSON, and then it's even 
>> possible to index it as is.
>> I'm currently using ES for storing RDF documents in json-ld on a specific 
>> index: in that case one can simply use the uri as an _id, recover the full 
>> original format by _source, and use basic search capabilities on the index, 
>> if escaping / nesting it's not a big deal.
>>
>> However, in order to use resource with some more flexibility, I think the 
>> best would be index them as "flat" as possible, then use an ad-hoc @context 
>> on the ES json to obtain again the original json-ld. 
>> This would be my ideal usage at the moment: seems complex at first, but 
>> it's not, I'm currently experimenting in saving @context for a _type, 
>> obtaining let's say a sort of _context, similar to a _mapping, to 
>> reconstruct the original semantics. 
>> If someone likes the idea, I'd like to share thoughts on that :-)
>>
>>
>> Il giorno venerdì 26 settembre 2014 14:08:07 UTC+2, Jörg Prante ha 
>> scritto:
>>>
>>> Lukáš,
>>>
>>> of course you are right, RDF/XML looks complex and requires parsing. The 
>>> underlying principle of all RDF is a graph (or a series of triples in form 
>>> of subject/predicate/object, where the triple series is a serialization of 
>>> the graph), So the challenge is first the parsing of RDF input, and second, 
>>> constructing the model, and third, serializing the model to an ES-friendly 
>>> input (here: JSON-LD, sort of). RDF ensures that there is a single model 
>>> for all serializations.
>>>
>>> This technical perspective does not necessarily solve all challenges 
>>> that are inherent to the chosen data model. For example, nested resources 
>>> in RDF. It might be feasible to flatten nested resource by their 
>>> identifiers and generate one JSON after the other. Or it could be feasible 
>>> to keep nested resources intact and wrap them into nested structures in a 
>>> single ES JSON object. 
>>>
>>> In my data model, I can map RDF subject IDs to ES doc IDs. Other data 
>>> models may prefer other approaches to select ES doc IDs.
>>>
>>> Jörg
>>>
>>>
>>>
>>> On Fri, Sep 26, 2014 at 10:11 AM, Lukáš Vlček <[email protected]> 
>>> wrote:
>>>
>>>> Jörg,
>>>>
>>>> my concern is that RDF/XML allow to express one thing in several ways. 
>>>> For example, if you take FOAF specification then there are several ways 
>>>> how 
>>>> you can express that one Person knows other Person. One way it using 
>>>> reference IDs other way it using nested Person inside other Person. See 
>>>> [1] 
>>>> for examples. My understanding is that although both ways express exactly 
>>>> the same information they lead to different XML representation and thus to 
>>>> different JSON-LD. Not that you can push such data in ES but I wonder if 
>>>> you can then have any consistent way of querying such data.
>>>>
>>>> May be there is some way how you can preprocess XML document and 
>>>> convert all nested Persons to references (would require arbitrary ID 
>>>> construction?). Or something similar. Though I am not sure this would be 
>>>> generally applicable approach to any RDF data.
>>>>
>>>> [1] http://www.xml.com/pub/a/2004/02/04/foaf.html
>>>>
>>>> Regards,
>>>> Lukas
>>>>
>>>> On Fri, Sep 26, 2014 at 9:28 AM, [email protected] <[email protected]
>>>> > wrote:
>>>>
>>>>> JSON-LD is perfect for ES indexing, as long as you use the "compact" 
>>>>> form of representation. 
>>>>>
>>>>> http://www.w3.org/TR/json-ld-api/#compaction-algorithms
>>>>>
>>>>> Example: 
>>>>>
>>>>> https://github.com/lanthaler/JsonLD/blob/master/Test/
>>>>> Fixtures/sample-compacted.jsonld
>>>>>
>>>>> This means you should use short field names and shorten IRIs to a 
>>>>> prefix form. This gives a convenient mapping to ES field names (e.g. 
>>>>> "dc:title" or "dc:creator"). The '@' fields can also be indexed and they 
>>>>> do 
>>>>> not control anything special in ES (some @id may be mapped to ES _id but 
>>>>> for nested structures this does not match)
>>>>>
>>>>> I use my own RDF API and transform RDF graphs (so not only JSON-LD but 
>>>>> also other formats like N-Triples and RDF/XML) into XContent using this 
>>>>> method:
>>>>>
>>>>> https://github.com/xbib/xbib/blob/master/content/src/main/
>>>>> java/org/xbib/rdf/content/DefaultResourceContentBuilder.java
>>>>>
>>>>> I plan to extend this content building by interpreting rdf:type and 
>>>>> rdf:list etc. to generate correct ES JSON objects and arrays. There is 
>>>>> also 
>>>>> an amount of work left to do for the plethora of XSD types in RDF 
>>>>> literals 
>>>>> or for language tags.
>>>>>
>>>>> This will be subsumed into an RDF input/output plugin for an ES-based 
>>>>> Linked Data Platform 
>>>>>
>>>>> http://www.w3.org/TR/ldp/
>>>>>
>>>>> but there is no ETA yet.
>>>>>
>>>>> Jörg
>>>>>
>>>>>
>>>>> On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček <[email protected]> 
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I think you will have to preprocess documents on your side first and 
>>>>>> then push into ES individually (you can push in batch).
>>>>>>
>>>>>> As a side note, I would say json-ld is quite low level serialization 
>>>>>> od RDF data IMO not optimal for ES indexing. May be better would be to 
>>>>>> find 
>>>>>> some RDF-OOM tool and have your RDF documents mapped to Java POJOs and 
>>>>>> serialize POJOs into JSONs instead (you can use Jackson library for that 
>>>>>> for example). This will give you better control over whole RDF -> JSON 
>>>>>> conversion process.
>>>>>>
>>>>>> Regards,
>>>>>> Lukas
>>>>>>
>>>>>> On Thu, Sep 25, 2014 at 7:21 PM, abo <[email protected]> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I'm new to Elasticsearch, so forgive me if this is a basic question 
>>>>>>> or if it's in some documentation that I haven't read...
>>>>>>>
>>>>>>> I am trying to load a json-ld file into ES. The json-ld file was 
>>>>>>> generated from an RDF file, using Jena. The structure starts with:
>>>>>>>
>>>>>>> {
>>>>>>>   "@graph" :
>>>>>>>
>>>>>>> followed by the individual "documents", each with:
>>>>>>>
>>>>>>> {
>>>>>>>     "@id" :
>>>>>>>
>>>>>>> and a variable number of parameters in each.
>>>>>>>
>>>>>>> My question is how do I load this into ES and ensure that documents 
>>>>>>> are individually referenced (as opposed to the entire json-ld file)?
>>>>>>>
>>>>>>> Do I need to doctor this json-ld file further in order to load it?
>>>>>>>
>>>>>>> Thanks for your help.
>>>>>>>
>>>>>>> -- abo
>>>>>>>
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "elasticsearch" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to [email protected].
>>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-
>>>>>>> 5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com 
>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>  -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>>> msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH
>>>>>> 4EY10xGA%40mail.gmail.com 
>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%
>>>>> 3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com 
>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-
>>>> L1rKqChja45X7EGTEyc2bw%40mail.gmail.com 
>>>> <https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Loading JSON-LD into ES

Reply via email to