The binary decoder needs some work to improve performance that requires some extra buffering. (AVRO-327). Once that is done, adding on some deferred lazy load capabilities wouldn't be that intrusive, and I am willing to build it into the Java BinaryDecoder if it is needed.
-Scott On Jan 22, 2010, at 6:38 PM, Philip Zeyliger wrote: > Not with any of today's APIs. "SELECT col1, col3 FROM t" is handled > easily: you construct a schema that only has those columns, and col2 > is skipped at read time. > > Does Hive have a use case for this that you're interested in? If you > don't mind paying the buffer copy, you could probably write a > "DeferredFoo" class that doesn't de-serialize certain structures... > > -- Philip > > On Fri, Jan 22, 2010 at 6:20 PM, Zheng Shao <[email protected]> wrote: >> I noticed that avro has the "skip" functions which can help skip a >> field when deserializing data. >> This is good for column pruning in most cases, but we might be able to >> do better in the following case. >> >> >> Let's say we have a query like this: >> >> CREATE TABLE t (col1 STRING, col2 STRING, col3 STRING); >> SELECT col2 FROM t WHERE col3 = 'abcde'; >> >> We want to get field col3 first, if that matches what we want, then we >> want to get to field col2. >> >> >> Is there anyway to "remember" the current location of deserialization, >> so that we can "resume" from that point? >> >> >> -- >> Yours, >> Zheng >>
