waixiaoyu opened a new issue #7475: Exception when loading CSV data with NULL 
metric value in MapReduce in 0.13.0
URL: https://github.com/apache/incubator-druid/issues/7475
 
 
   When I used index_hadoop task to load CSV data with Null metric value, it 
happened exception and the MR job was failed.
   
   
   
   ### Affected Version
   
   0.13.0
   
   ### Description
   
   **My input data is:**
   {time_id=1532563203000, res_ins_id=6000807403000003, r11030_p1001=698.71, 
r11030_i10001=401.52, r11030_i10002=null}, dimensions=[res_ins_id, 
r11030_p1001]}
   
   **The exception is like following:**
   
   
   Error: org.apache.druid.java.util.common.RE: Failure on 
row[1532563201000,6000807403000001,106.04,907.14,,]
        at 
org.apache.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:151)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
        at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
        at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
   Caused by: java.lang.RuntimeException: Max parse exceptions exceeded, 
terminating task...
        at 
org.apache.druid.indexer.HadoopDruidIndexerMapper.handleParseException(HadoopDruidIndexerMapper.java:175)
        at 
org.apache.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:143)
        ... 8 more
   Caused by: org.apache.druid.java.util.common.parsers.ParseException: Found 
unparseable columns in row: 
[MapBasedInputRow{timestamp=2018-07-26T00:00:01.000Z, 
event={time_id=1532563201000, res_ins_id=6000807403000001, r11030_p1001=106.04, 
r11030_i10001=907.14, r11030_i10002=}, dimensions=[res_ins_id, r11030_p1001]}], 
exceptions: [Unable to parse value[] for field[r11030_i10002],]
        at 
org.apache.druid.segment.incremental.IncrementalIndex.getCombinedParseException(IncrementalIndex.java:765)
        at 
org.apache.druid.indexer.IndexGeneratorJob$IndexGeneratorMapper.innerMap(IndexGeneratorJob.java:385)
        at 
org.apache.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:137)
        ... 8 more
   
   
   **After i debug it, i find that the there might be some error in 
IndexGeneratorJob.java:360,** 
    InputRowSerde.toBytes(
                                                        typeHelperMap,
                                                        inputRow,
                                                        
aggsForSerializingSegmentInputRow
                                                    )
   
   because of InputRow is like : 
   MapBasedInputRow{timestamp=2018-07-26T00:00:03.000Z, 
event={time_id=1532563203000, res_ins_id=6000807403000003, r11030_p1001=698.71, 
r11030_i10001=401.52, r11030_i10002=null}, dimensions=[res_ins_id, 
r11030_p1001]}
   but in **JSON senario**, there is no element with Null or Empty value
   
   ### Fool Solution
   **So, i just try to fix it and use a very fool (but easy) solution to solve 
it.**
   In HadoopDruidIndexerMapper.java:92, I add some code like this:
   
               if (parser.getParseSpec() instanceof CSVParseSpec) {
                 MapBasedInputRow row = (MapBasedInputRow) inputRow;
                 for (Iterator<Map.Entry<String, Object>> it = 
row.getEvent().entrySet().iterator(); it.hasNext(); ) {
                   Map.Entry<String, Object> e = it.next();
                   Object metricValue = e.getValue();
                   if (!row.getDimensions().contains(e.getKey()) && 
!isValidMetricValue(metricValue)) {
                     it.remove();
                   }
                 }
               }

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to