In your case, you have to create ES bulk format for efficient indexing of such files, or better, instrumenting the ES python client to push JSON-LD into ES.
Jörg On Sun, Sep 28, 2014 at 4:23 AM, abo <[email protected]> wrote: > Thank you all for your responses and interesting conversation about RDF > serialization into ES. With regards to my original post, I ended up using > a solution based on RDFlib: > > https://github.com/RDFLib/rdflib-jsonld > > It works as expected, and compacting the content by using @context does > the trick and is flexible. It is an in-memory process however, which could > be an issue for those with very large RDF files. When using Jena, I didn't > find the ability to add @context mappings, but maybe I didn't dig enough. > > On a side note, looks like the rdflib-jsonld solution already has support > for XSD literals and lists, so perhaps it could be extended to map directly > into ES _type if that is a good direction. > > With my Json-ld file ready for ingestion into ES, I do have another > question: are there utilities to bulk load such documents (the json-ld > contains individual documents per ES, each with an _id), or do I just write > a script that calls curl -XPUT for each record in the json-ld file? Seems > like a pretty common use case. > > Thanks again to all, interesting stuff. Happy to contribute to extending > an existing solution. > > -- ab > > On Saturday, September 27, 2014 9:24:24 AM UTC-7, Jörg Prante wrote: >> >> For the _mapping, I think about two more types for that I intend to write >> ES type mappers, "iri" and "literal", so ES can receive XSD data types and >> language codes and map them to fields / analyzers. IRIs are just opaque >> strings but they can be shortened if prefix is configured and can be used >> as _id or for referencing to an _id. >> >> Instead of _mapping I prefer the thought about handling @contexts like >> template documents. >> >> Not sure about the best way to manage JSON-LD. There are two approaches: >> save a JSON-LD (you say original document) beside other versions. This >> requires more space and I'm not sure about the purpose of the original >> JSON-LD. The other approach is more about dropping original JSON-LD after >> parsing it to triples and store the triples in an ES JSON doc which is a >> surrogate close to JSON-LD but arranges with all the JSON dialect >> characteristics of the ES document DSL. >> >> I'm not in scala, so I can not promise much, but happy about glimpsing >> all related code! >> >> Jörg >> >> >> On Sat, Sep 27, 2014 at 4:50 PM, Alfredo Serafini <[email protected]> >> wrote: >> >>> HI Jorg Indeed! :-) >>> >>> What I like about _mapping is that they are managed as documents too, >>> and they can be: >>> >>> 1. automatically inferred from data (at risk, but useful) >>> 2. provided by static files, in some cases >>> 3. managed for _index/_types >>> >>> all those things could be done with something like a _context (which >>> will include at first a single @context). The first point should probably >>> be avoided at all for json-ld :-), but it should be possible. >>> >>> But we may need more @context items for a single "resource" schema >>> (referring to _index/_type), and in perspective it's even possible to >>> re-use a @context for different _index/_type pairs. >>> Furthermore: when exposing results in jsonld one might want to reference >>> an external @context and merge it before providing results, and In my >>> opinion the more "risky" part is when input the original json-ld, if we >>> want to flat it and extract the @context which will permits us to >>> recostruct later the original document. >>> Given the fact that it could be possible to map every kind of json >>> results from ES, documents imported as jsonld might has to maintain at >>> least the original fields. >>> >>> I'd like to put some code on github and if you want we could join the >>> effort on that? I'm working mostly on scala at the moment. What do you >>> think about? >>> >>> >>> >>> >>> Il giorno venerdì 26 settembre 2014 20:32:52 UTC+2, Jörg Prante ha >>> scritto: >>>> >>>> Absolutely. My thought is about managing one (or more) context ES JSON >>>> document(s) where all the @context definitions of an index live. A format >>>> plugin can then process search results and converts ES JSON to expanded >>>> JSON-LD and from there to other RDF serializations. >>>> >>>> Jörg >>>> >>>> On Fri, Sep 26, 2014 at 6:23 PM, Alfredo Serafini <[email protected]> >>>> wrote: >>>> >>>>> Hi >>>>> >>>>> using json-ld is indeed rather simple, as it is JSON, and then it's >>>>> even possible to index it as is. >>>>> I'm currently using ES for storing RDF documents in json-ld on a >>>>> specific index: in that case one can simply use the uri as an _id, recover >>>>> the full original format by _source, and use basic search capabilities on >>>>> the index, if escaping / nesting it's not a big deal. >>>>> >>>>> However, in order to use resource with some more flexibility, I think >>>>> the best would be index them as "flat" as possible, then use an ad-hoc >>>>> @context on the ES json to obtain again the original json-ld. >>>>> This would be my ideal usage at the moment: seems complex at first, >>>>> but it's not, I'm currently experimenting in saving @context for a _type, >>>>> obtaining let's say a sort of _context, similar to a _mapping, to >>>>> reconstruct the original semantics. >>>>> If someone likes the idea, I'd like to share thoughts on that :-) >>>>> >>>>> >>>>> Il giorno venerdì 26 settembre 2014 14:08:07 UTC+2, Jörg Prante ha >>>>> scritto: >>>>>> >>>>>> Lukáš, >>>>>> >>>>>> of course you are right, RDF/XML looks complex and requires parsing. >>>>>> The underlying principle of all RDF is a graph (or a series of triples in >>>>>> form of subject/predicate/object, where the triple series is a >>>>>> serialization of the graph), So the challenge is first the parsing of RDF >>>>>> input, and second, constructing the model, and third, serializing the >>>>>> model >>>>>> to an ES-friendly input (here: JSON-LD, sort of). RDF ensures that there >>>>>> is >>>>>> a single model for all serializations. >>>>>> >>>>>> This technical perspective does not necessarily solve all challenges >>>>>> that are inherent to the chosen data model. For example, nested resources >>>>>> in RDF. It might be feasible to flatten nested resource by their >>>>>> identifiers and generate one JSON after the other. Or it could be >>>>>> feasible >>>>>> to keep nested resources intact and wrap them into nested structures in a >>>>>> single ES JSON object. >>>>>> >>>>>> In my data model, I can map RDF subject IDs to ES doc IDs. Other data >>>>>> models may prefer other approaches to select ES doc IDs. >>>>>> >>>>>> Jörg >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Sep 26, 2014 at 10:11 AM, Lukáš Vlček <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Jörg, >>>>>>> >>>>>>> my concern is that RDF/XML allow to express one thing in several >>>>>>> ways. For example, if you take FOAF specification then there are several >>>>>>> ways how you can express that one Person knows other Person. One way it >>>>>>> using reference IDs other way it using nested Person inside other >>>>>>> Person. >>>>>>> See [1] for examples. My understanding is that although both ways >>>>>>> express >>>>>>> exactly the same information they lead to different XML representation >>>>>>> and >>>>>>> thus to different JSON-LD. Not that you can push such data in ES but I >>>>>>> wonder if you can then have any consistent way of querying such data. >>>>>>> >>>>>>> May be there is some way how you can preprocess XML document and >>>>>>> convert all nested Persons to references (would require arbitrary ID >>>>>>> construction?). Or something similar. Though I am not sure this would be >>>>>>> generally applicable approach to any RDF data. >>>>>>> >>>>>>> [1] http://www.xml.com/pub/a/2004/02/04/foaf.html >>>>>>> >>>>>>> Regards, >>>>>>> Lukas >>>>>>> >>>>>>> On Fri, Sep 26, 2014 at 9:28 AM, [email protected] < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> JSON-LD is perfect for ES indexing, as long as you use the >>>>>>>> "compact" form of representation. >>>>>>>> >>>>>>>> http://www.w3.org/TR/json-ld-api/#compaction-algorithms >>>>>>>> >>>>>>>> Example: >>>>>>>> >>>>>>>> https://github.com/lanthaler/JsonLD/blob/master/Test/Fixture >>>>>>>> s/sample-compacted.jsonld >>>>>>>> >>>>>>>> This means you should use short field names and shorten IRIs to a >>>>>>>> prefix form. This gives a convenient mapping to ES field names (e.g. >>>>>>>> "dc:title" or "dc:creator"). The '@' fields can also be indexed and >>>>>>>> they do >>>>>>>> not control anything special in ES (some @id may be mapped to ES _id >>>>>>>> but >>>>>>>> for nested structures this does not match) >>>>>>>> >>>>>>>> I use my own RDF API and transform RDF graphs (so not only JSON-LD >>>>>>>> but also other formats like N-Triples and RDF/XML) into XContent using >>>>>>>> this >>>>>>>> method: >>>>>>>> >>>>>>>> https://github.com/xbib/xbib/blob/master/content/src/main/ja >>>>>>>> va/org/xbib/rdf/content/DefaultResourceContentBuilder.java >>>>>>>> >>>>>>>> I plan to extend this content building by interpreting rdf:type and >>>>>>>> rdf:list etc. to generate correct ES JSON objects and arrays. There is >>>>>>>> also >>>>>>>> an amount of work left to do for the plethora of XSD types in RDF >>>>>>>> literals >>>>>>>> or for language tags. >>>>>>>> >>>>>>>> This will be subsumed into an RDF input/output plugin for an >>>>>>>> ES-based Linked Data Platform >>>>>>>> >>>>>>>> http://www.w3.org/TR/ldp/ >>>>>>>> >>>>>>>> but there is no ETA yet. >>>>>>>> >>>>>>>> Jörg >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I think you will have to preprocess documents on your side first >>>>>>>>> and then push into ES individually (you can push in batch). >>>>>>>>> >>>>>>>>> As a side note, I would say json-ld is quite low level >>>>>>>>> serialization od RDF data IMO not optimal for ES indexing. May be >>>>>>>>> better >>>>>>>>> would be to find some RDF-OOM tool and have your RDF documents mapped >>>>>>>>> to >>>>>>>>> Java POJOs and serialize POJOs into JSONs instead (you can use Jackson >>>>>>>>> library for that for example). This will give you better control over >>>>>>>>> whole >>>>>>>>> RDF -> JSON conversion process. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Lukas >>>>>>>>> >>>>>>>>> On Thu, Sep 25, 2014 at 7:21 PM, abo <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> I'm new to Elasticsearch, so forgive me if this is a basic >>>>>>>>>> question or if it's in some documentation that I haven't read... >>>>>>>>>> >>>>>>>>>> I am trying to load a json-ld file into ES. The json-ld file was >>>>>>>>>> generated from an RDF file, using Jena. The structure starts with: >>>>>>>>>> >>>>>>>>>> { >>>>>>>>>> "@graph" : >>>>>>>>>> >>>>>>>>>> followed by the individual "documents", each with: >>>>>>>>>> >>>>>>>>>> { >>>>>>>>>> "@id" : >>>>>>>>>> >>>>>>>>>> and a variable number of parameters in each. >>>>>>>>>> >>>>>>>>>> My question is how do I load this into ES and ensure that >>>>>>>>>> documents are individually referenced (as opposed to the entire >>>>>>>>>> json-ld >>>>>>>>>> file)? >>>>>>>>>> >>>>>>>>>> Do I need to doctor this json-ld file further in order to load it? >>>>>>>>>> >>>>>>>>>> Thanks for your help. >>>>>>>>>> >>>>>>>>>> -- abo >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "elasticsearch" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to [email protected]. >>>>>>>>>> To view this discussion on the web visit >>>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb >>>>>>>>>> 1-4c50-96c4-8f586e1e0807%40googlegroups.com >>>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>> . >>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "elasticsearch" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5% >>>>>>>>> 3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com >>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "elasticsearch" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZm >>>>>>>> Tcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com >>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "elasticsearch" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNt >>>>>>> TAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com >>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40goo >>>>> glegroups.com >>>>> <https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/1a4c20a6-a215-42b8-bf11-350b766b508a%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/1a4c20a6-a215-42b8-bf11-350b766b508a%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEJqXDGX2xNqCi34CQu-q_V2OZHkNx2t5FEBuSLQmXzdw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
