bilbingham commented on pull request #649: URL: https://github.com/apache/orc/pull/649#issuecomment-789971351
To reproduce. (Sorry this is only a fragment) my test code is currently tied to a specific hdp cluster (hive and conf variables., I don't have a generic test case at the moment) WIth a hive table. based on Hive APIV2 Streaming. (https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest+V2 ) CREATE TABLE acidorc ( col1 string, col2 string, col3 string, col4 string, col5 string ) PARTITIONED BY (part1 string) STORED AS ORC tblproperties( "transactional"="true", "orc.compress"="SNAPPY", "orc.bloom.filter.columns"="col1,col2,col3"); Path p = "/path/to/acidorc/files"; OrcInputFormat oif = new OrcInputFormat(); JobConf jc = new JobConf(); // Added the schema to the conf resolves as well //OrcConf.MAPRED_INPUT_SCHEMA.setString(jc,"struct<operation:int,originalTransaction:bigint,bucket:int,rowId:bigint,currentTransaction:bigint,row:struct<col1:string,col2:string,col3:string,col4:string,col5:string>>"); //Works fine until you try to INCLUDE_COLUMNS OrcConf.INCLUDE_COLUMNS.setString(jc,"5"); Job theJob = new Job(jc); org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(theJob, p); List<org.apache.hadoop.mapreduce.InputSplit> splits = oif.getSplits(theJob); splits.forEach(split -> { try { org.apache.hadoop.mapreduce.RecordReader rr = oif.createRecordReader(split, tac); rr.initialize(split,tac); while (rr.nextKeyValue()) { OutputStream outputStream = new ByteArrayOutputStream(); JsonWriter jw = new JsonWriter(new OutputStreamWriter(outputStream, "UTF-8")); OrcStruct row = (OrcStruct)(((OrcStruct) rr.getCurrentValue()).getFieldValue(5)); jw.beginObject(); for (int i = 0; i < row.getNumFields(); i++) { jw.name(row.getSchema().getFieldNames().get(i)); jw.value(String.valueOf(row.getFieldValue(i))); } jw.endObject(); jw.close(); } } catch (Exception ex) { System.out.println("Bummer " + ex.getMessage()); } } ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
