Hi, I am trying to use RCFile outside of realms of Hive. Though I am still using column serde and column struct to get the row. I found that the way to tell RCFile the columns I am interested in is through setting READ_COLUMN_IDS_CONF_STR key in jobconf. This worked except for one thing. If there are originally 5 columns in the data and I ask RCFile to project 3 columns out of it. I get back row of 5 columns with data in 3 columns I asked it to project and 2 nulls. I expected it to give me back row with exactly 3 columns. As a concrete example, assume data is as follows:
123 | 456 | "hadoop" | 23090L | 5.3D | and I ask to project column 0,2,4 I get back 123 | null | "hadoop" | null | 5.3D | instead I had expected to get: |123| "hadoop" | 5.3D | So, my question is this the expected behavior (or I am doing something wrong ?). If it is, then is this by design and it is expected that "higher layers" (like hive) are expected to reconstruct the row with nulls weeded out. Thanks, Ashutosh