Hi,

I am trying to use RCFile outside of realms of Hive.  Though I am
still using column serde and column struct to get the row. I found
that the way to tell RCFile the columns  I am interested in is through
setting READ_COLUMN_IDS_CONF_STR key in jobconf. This worked except
for one thing. If there are originally 5 columns in the data and I ask
RCFile to project 3 columns out of it. I get back row of 5 columns
with data in 3 columns I asked it to project and 2 nulls. I expected
it to give me back row with exactly 3 columns. As a concrete example,
assume data is as follows:

123 | 456 | "hadoop" | 23090L | 5.3D |
and I ask to project column 0,2,4 I get back
123 | null | "hadoop" | null | 5.3D |
instead I had expected to get:
|123| "hadoop" | 5.3D |

So, my question is this the expected behavior (or I am doing something
wrong ?). If it is, then is this by design and  it is expected that
"higher layers" (like hive) are expected to reconstruct the row with
nulls weeded out.

Thanks,
Ashutosh

Reply via email to