+1 :-)

Am 31.01.2006 um 22:06 schrieb Andrzej Bialecki:

Hi,

I wonder, would it be a good idea to replace the (rather wasteful) 4-byte ints with Lucene's variable-byte int encoding, in all places where size matters? We could "borrow" the code from Lucene and create a VIntWritable for this purpose. I'm thinking specifically about the following places:

* UTF8 (2-byte string length)

* ArrayWritable/BytesWritable/TwoDArrayWritable (4-byte length)

* Properties and derived maps (like ContentProperties): all lengths are written as 4-byte ints.

* any Writable that consists of lists of values is currently serialized using 4-byte ints for the size of list, e.g. ParseData.outlinks

Overall I think the size savings could be considerable, at the cost of some CPU.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Reply via email to