[GitHub] [orc] bilbingham commented on pull request #649: [ORC-756] Check null readerschema prior to setting readerSchemaIsAcid && readerColumnOffset

GitBox Wed, 03 Mar 2021 10:49:15 -0800


bilbingham commented on pull request #649:
URL: https://github.com/apache/orc/pull/649#issuecomment-789971351



   To reproduce. (Sorry this is only a fragment) my test code is currently tied 
to a specific hdp cluster (hive and conf variables.,  I don't have a generic 
test case at the moment) 
   
   WIth a hive table. based on Hive APIV2 Streaming.  
(https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest+V2 )
   CREATE TABLE acidorc ( col1 string, col2 string, col3 string, col4 string, 
col5 string )
   PARTITIONED BY (part1 string)
   STORED AS ORC 
   tblproperties( "transactional"="true", "orc.compress"="SNAPPY",  
"orc.bloom.filter.columns"="col1,col2,col3");
   
   
   
   Path p = "/path/to/acidorc/files";
   OrcInputFormat oif = new OrcInputFormat();
   JobConf jc = new JobConf();
   // Added the schema to the conf resolves as well 
   
//OrcConf.MAPRED_INPUT_SCHEMA.setString(jc,"struct<operation:int,originalTransaction:bigint,bucket:int,rowId:bigint,currentTransaction:bigint,row:struct<col1:string,col2:string,col3:string,col4:string,col5:string>>");
   //Works fine until you try to INCLUDE_COLUMNS 
   OrcConf.INCLUDE_COLUMNS.setString(jc,"5");
   Job theJob = new Job(jc);
   org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(theJob, 
p);
   List<org.apache.hadoop.mapreduce.InputSplit> splits = oif.getSplits(theJob);
   splits.forEach(split -> {
               try {
                   org.apache.hadoop.mapreduce.RecordReader rr = 
oif.createRecordReader(split, tac);
                   rr.initialize(split,tac);
                   while (rr.nextKeyValue()) {
                       OutputStream outputStream = new ByteArrayOutputStream();
                       JsonWriter jw = new JsonWriter(new 
OutputStreamWriter(outputStream, "UTF-8"));
                       OrcStruct row = (OrcStruct)(((OrcStruct) 
rr.getCurrentValue()).getFieldValue(5));
                       jw.beginObject();
                       for (int i = 0; i < row.getNumFields(); i++) {
                           jw.name(row.getSchema().getFieldNames().get(i));
                           jw.value(String.valueOf(row.getFieldValue(i)));
                       }
                       jw.endObject();
                       jw.close();
                   }
               } catch (Exception ex) {
                   System.out.println("Bummer " + ex.getMessage());
               }
   }


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [orc] bilbingham commented on pull request #649: [ORC-756] Check null readerschema prior to setting readerSchemaIsAcid && readerColumnOffset

Reply via email to