Thanks Ryan, Never used either thrift or protobuf so will check them out.
I will think around if I can get away with expanding into multiple rows, and adding an extra grouping identifier to indicate rows are from the same source record. The problem with that approach is that I might end up having to do a "distinct" operation at the end of some of my other scans, but (thinking aloud) this is probably ok as I can create a key from the output rows (MD5 or so would suffice) and do an identity mapreduce on the results. I don't really want to have to use 2 tables and do the join in mapreduce as I think this will be a very common operation for me. I can imagine that a scan using a filter and having to serialize and do a "collection.contains()?" operation on each would be hyper slow compared to a comparison on the byte stream, which presumably can be done for primitives. Since they are basically List<String> I might try and do a serialisation that can do a "collection contains" without having to construct the Java objects in the filter. Let me think it through and I'll write back if it works. Cheers, Tim On Sun, Jul 19, 2009 at 6:37 PM, Ryan Rawson<[email protected]> wrote: > You're heading in the right direction. Id suggest not using writables, and > using a flexible system like thrift or protobuf. They support expandability > without invalidating old data. > > But you want to be careful about putting too much code in a filter. It gets > called frequently and I'm not exactly sure of the perf trade offs yet. Be > prepared to be surprised I guess? > > On Jul 19, 2009 2:34 AM, "tim robertson" <[email protected]> wrote: > > Hi all, > > If I want to have a collection of values for a column (e.g. > List<String>) do I need to write some kind of CollectionWritable and > serialize and deserialize to bytes myself or am I missing something > obvious? > Would I also need to write ColumnValueFilters to do scans of rows with > a collection containing a value? > > This is my first attempt at a many2one in a singe row. > > Many thanks, > > Tim >
