On 10 Apr 2013, at 18:18, Emmanuel Bernard <[email protected]> wrote:
> I favor the first options for a few reasons: > > - much easier client side implementations > Frankly rewriting the analyzer logic of Lucene in every languages is > not a piece of cake and you are out of luck for custom analysers I'm not suggesting all the analyser logic. Just the extraction of indexed fields into name/value pairs, to be sent alongside the blob value. > - more robust client implementation: if we change how indexing is done > clients don't have to change > - reindexing: if there is a need to rebuild the index, or if the user > decides to reindex data differently, you must be able to read the data > on the server side > - validation: if you want to implement (cross entry) validation, the > server needs to be able to read the data. > - async, validation and indexing can be done in an async way on the > server and avoid perceived latency from a client requiest to the > result Valid points above though. > I'm not sure JSON should be the format though. As you said it's quite > verbose and string is not exactly the most efficient way to process > data. What would that format be, then? > > Emmanuel > > > On Wed 2013-04-10 17:45, Manik Surtani wrote: >> Yes. We haven't quite designed how remote querying will work, but we have a >> few ideas. First, let me explain how in-VM indexing works. An object's >> fields are appropriately annotated so that when it is stored in Infinispan >> with a put(), Hibernate Search can extract the fields and values, flatten it >> into a Lucene-friendly "document", and associate it with the entry's key for >> searching later. >> >> Now one approach to doing this when storing objects remotely is the >> serialisation format. A format that can be parsed on the server side for >> easy indexing. An example of this could be JSON (an appropriate >> transformation will need to exist on the server side to strip out irrelevant >> fields before indexing). This would be completely platform-independent, and >> also support the interop you described below. The drawback? Slow JSON >> serialisation and deserialization, and a very verbose data stream. >> >> Another approach may be to perform the field extraction on the client side, >> so that the data sent to the server would be key=XXX (binary), value=YYY >> (binary), indexing_metadata=ZZZ (JSON). This way the server does not need >> to be able to parse the value for indexing, since the field data it needs is >> already provided in a platform-independent manner (JSON). The benefit here >> is that keys and values can still be binary, and can use an efficient >> marshaller. The drawback, is that field extraction needs to happen on the >> client. Not hard for the Java client (bits of Hibernate Search could be >> reused), but for non-Java clients this may increase complexity of those >> clients quite a bit (much easier for dynamic language clients - >> python/ruby). This approach does *not* solve your problem below, because >> for interop you will still need a platform-independent serialisation >> mechanism like Avro or ProtoBufs for the object <--> blob <--> object >> conversion. >> >> Personally, I prefer the second approach since it separates concerns >> (portable indexes vs. portable values) plus would lead to (IMO) a >> better-performing implementation. I'd love to hear others' thoughts though. >> >> Cheers >> Manik >> >> On 10 Apr 2013, at 17:11, Mircea Markus <[email protected]> wrote: >> >>> That is write the Person object in Java and read a Person object in C#, >>> assume a hotrod client for simplicity. >>> Now at some point we'll have to run a query over the same hotrod, something >>> like "give me all the Persons named Mircea". >>> At this stage, the server side needs to be aware of the Person object in >>> order to be able to run the query and select the relevant Persons. It needs >>> a schema. Instead of suggesting Avro as an data interoperability protocol, >>> we might want to define and use this schema instead: we'd need it anyway >>> for remote querying and we won't have two ways of doing the same thing. >>> Thoughts? >>> >>> Cheers, >>> -- >>> Mircea Markus >>> Infinispan lead (www.infinispan.org) >>> >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> [email protected] >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> -- >> Manik Surtani >> [email protected] >> twitter.com/maniksurtani >> >> Platform Architect, JBoss Data Grid >> http://red.ht/data-grid >> >> >> _______________________________________________ >> infinispan-dev mailing list >> [email protected] >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani [email protected] twitter.com/maniksurtani Platform Architect, JBoss Data Grid http://red.ht/data-grid _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
