On 10 Apr 2013, at 18:29, Mircea Markus <[email protected]> wrote: > > On 10 Apr 2013, at 17:45, Manik Surtani wrote: > >> Yes. We haven't quite designed how remote querying will work, but we have a >> few ideas. > Thanks for sharing :-) >> First, let me explain how in-VM indexing works. An object's fields are >> appropriately annotated so that when it is stored in Infinispan with a >> put(), Hibernate Search can extract the fields and values, flatten it into a >> Lucene-friendly "document", and associate it with the entry's key for >> searching later. >> >> Now one approach to doing this when storing objects remotely is the >> serialisation format. A format that can be parsed on the server side for >> easy indexing. An example of this could be JSON (an appropriate >> transformation will need to exist on the server side to strip out irrelevant >> fields before indexing). This would be completely platform-independent, and >> also support the interop you described below. The drawback? Slow JSON >> serialisation and deserialization, and a very verbose data stream. > What about using our own object definition, based on a fixed number of > supported types: e.g. int, long, , bigdecimal, String, Date and some more. > Each client object would need to implement the logic to serialize and > deserialize itself into this format, using some StremWriters, a bit like our > serilizers today. > The StreamWritters would be provided be provided by us, for every supported > programming language, and would have methods like writeInt,writeLong etc. > Another nice thing we can add to this object scheme is versioning, which is > useful for rolling upgrades. > The server side would then index the known types using lucene. The client > should be able to define queries based on these objects and supported types > (the query semantic to be defined). > Disclaimer: not an original idea, there is already a similar approach used in > other datagrids providers.
Sounds a LOT like ProtoBufs. Or - yuck - CORBA. But generally, wheel-reinvention? Why can't we use an existing library that provides this? >> >> Another approach may be to perform the field extraction on the client side, >> so that the data sent to the server would be key=XXX (binary), value=YYY >> (binary), indexing_metadata=ZZZ (JSON). This way the server does not need >> to be able to parse the value for indexing, since the field data it needs is >> already provided in a platform-independent manner (JSON). The benefit here >> is that keys and values can still be binary, and can use an efficient >> marshaller. The drawback, is that field extraction needs to happen on the >> client. Not hard for the Java client (bits of Hibernate Search could be >> reused), but for non-Java clients this may increase complexity of those >> clients quite a bit (much easier for dynamic language clients - python/ruby). > The client would need to build an lucene index itself and send it to the > server, I guess Sanne/Emmanuel can comment more on the complexity involved > here. > Here are some limitations I see to this approach: > - cannot define an index at runtime. If we want to do that, the client would > need to storm all the data in the system and re-index it. > - cannot run a query for data that is not indexed. I think this is a pretty > common requirement as well. >> This approach does *not* solve your problem below, because for interop you >> will still need a platform-independent serialisation mechanism like Avro or >> ProtoBufs for the object <--> blob <--> object conversion. > Indeed. I think we should decide what approach we take and if we go for the > former, not even suggest Apache Avro but implement our own scheme. See above. Why implement our own? Portable and efficient object serialisation is an entire sub-field of computer science in itself; do we _really_ want to commit to building and maintaining our own? >> Personally, I prefer the second approach since it separates concerns >> (portable indexes vs. portable values) plus would lead to (IMO) a >> better-performing implementation. I'd love to hear others' thoughts though. > I don't like the first approach because of the marshalling overhead. The > former You mean the latter? > seems complex, doesn't scale(requires the implementation of indexing for > every programming language) and limiting (indexes need to be defined a > priori, cannot query for non-indexed data). >> >> Cheers >> Manik >> >> On 10 Apr 2013, at 17:11, Mircea Markus <[email protected]> wrote: >> >>> That is write the Person object in Java and read a Person object in C#, >>> assume a hotrod client for simplicity. >>> Now at some point we'll have to run a query over the same hotrod, something >>> like "give me all the Persons named Mircea". >>> At this stage, the server side needs to be aware of the Person object in >>> order to be able to run the query and select the relevant Persons. It needs >>> a schema. Instead of suggesting Avro as an data interoperability protocol, >>> we might want to define and use this schema instead: we'd need it anyway >>> for remote querying and we won't have two ways of doing the same thing. >>> Thoughts? >>> >>> Cheers, >>> -- >>> Mircea Markus >>> Infinispan lead (www.infinispan.org) >>> >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> [email protected] >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> -- >> Manik Surtani >> [email protected] >> twitter.com/maniksurtani >> >> Platform Architect, JBoss Data Grid >> http://red.ht/data-grid >> >> >> _______________________________________________ >> infinispan-dev mailing list >> [email protected] >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani [email protected] twitter.com/maniksurtani Platform Architect, JBoss Data Grid http://red.ht/data-grid _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
