Re: [infinispan-dev] data interoperability and remote querying

Manik Surtani Wed, 10 Apr 2013 10:55:52 -0700

On 10 Apr 2013, at 18:18, Emmanuel Bernard <[email protected]> wrote:


> I favor the first options for a few reasons:
> 
> - much easier client side implementations
>  Frankly rewriting the analyzer logic of Lucene in every languages is
>  not a piece of cake and you are out of luck for custom analysers

I'm not suggesting all the analyser logic.  Just the extraction of indexed 
fields into name/value pairs, to be sent alongside the blob value.

> - more robust client implementation: if we change how indexing is done
>  clients don't have to change
> - reindexing: if there is a need to rebuild the index, or if the user
>  decides to reindex data differently, you must be able to read the data
>  on the server side
> - validation: if you want to implement (cross entry) validation, the
>  server needs to be able to read the data.
> - async, validation and indexing can be done in an async way on the
>  server and avoid perceived latency from a client requiest to the
>  result

Valid points above though.

> I'm not sure JSON should be the format though. As you said it's quite
> verbose and string is not exactly the most efficient way to process
> data.

What would that format be, then?

> 
> Emmanuel
> 
> 
> On Wed 2013-04-10 17:45, Manik Surtani wrote:
>> Yes.  We haven't quite designed how remote querying will work, but we have a 
>> few ideas.  First, let me explain  how in-VM indexing works.  An object's 
>> fields are appropriately annotated so that when it is stored in Infinispan 
>> with a put(), Hibernate Search can extract the fields and values, flatten it 
>> into a Lucene-friendly "document", and associate it with the entry's key for 
>> searching later.
>> 
>> Now one approach to doing this when storing objects remotely is the 
>> serialisation format.  A format that can be parsed on the server side for 
>> easy indexing.  An example of this could be JSON (an appropriate 
>> transformation will need to exist on the server side to strip out irrelevant 
>> fields before indexing).  This would be completely platform-independent, and 
>> also support the interop you described below.  The drawback?  Slow JSON 
>> serialisation and deserialization, and a very verbose data stream.
>> 
>> Another approach may be to perform the field extraction on the client side, 
>> so that the data sent to the server would be key=XXX (binary), value=YYY 
>> (binary), indexing_metadata=ZZZ (JSON).  This way the server does not need 
>> to be able to parse the value for indexing, since the field data it needs is 
>> already provided in a platform-independent manner (JSON).  The benefit here 
>> is that keys and values can still be binary, and can use an efficient 
>> marshaller.  The drawback, is that field extraction needs to happen on the 
>> client.  Not hard for the Java client (bits of Hibernate Search could be 
>> reused), but for non-Java clients this may increase complexity of those 
>> clients quite a bit (much easier for dynamic language clients - 
>> python/ruby).  This approach does *not* solve your problem below, because 
>> for interop you will still need a platform-independent serialisation 
>> mechanism like Avro or ProtoBufs for the object <--> blob <--> object 
>> conversion.
>> 
>> Personally, I prefer the second approach since it separates concerns 
>> (portable indexes vs. portable values) plus would lead to (IMO) a 
>> better-performing implementation.  I'd love to hear others' thoughts though.
>> 
>> Cheers
>> Manik
>> 
>> On 10 Apr 2013, at 17:11, Mircea Markus <[email protected]> wrote:
>> 
>>> That is write the Person object in Java and read a Person object in C#, 
>>> assume a hotrod client for simplicity.
>>> Now at some point we'll have to run a query over the same hotrod, something 
>>> like "give me all the Persons named Mircea".
>>> At this stage, the server side needs to be aware of the Person object in 
>>> order to be able to run the query and select the relevant Persons. It needs 
>>> a schema. Instead of suggesting Avro as an data interoperability protocol, 
>>> we might want to define and use this schema instead: we'd need it anyway 
>>> for remote querying and we won't have two ways of doing the same thing.
>>> Thoughts? 
>>> 
>>> Cheers,
>>> -- 
>>> Mircea Markus
>>> Infinispan lead (www.infinispan.org)
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [email protected]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> --
>> Manik Surtani
>> [email protected]
>> twitter.com/maniksurtani
>> 
>> Platform Architect, JBoss Data Grid
>> http://red.ht/data-grid
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> [email protected]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> [email protected]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
[email protected]
twitter.com/maniksurtani

Platform Architect, JBoss Data Grid
http://red.ht/data-grid


_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] data interoperability and remote querying

Reply via email to