Hi all,

PigStoarge parsing a csv file, did I get it right :

HDFS_Block -> TextInputFormat -> (Key:offset, Value:line) -> PigStorage ->
Tuple -> Mapper ?

If so, what are the input/output (key, value) pairs of the mapper ?

How does formats like RC/ORC (that promise to read less input) work ?

HDFS_Block -> ORCInputFormat (concerned columns only) -> (Key, Value) ->
ORCParser ? -> Tuple -> Mapper ?

Best regards,

Reply via email to