vogievetsky opened a new issue #8709: The timestamp column format is ignored if 
the column value is numeric
URL: https://github.com/apache/incubator-druid/issues/8709
 
 
   ### Affected Version
   
   Druid 0.16.0 and probably earlier also
   
   ### Description
   
   If the value of the JSON is a `Number` then the format is ignored and god 
knows what happens instead (millis is used?).
   
   The issue is in this line of code: 
https://github.com/apache/incubator-druid/blob/84598fba3b283cbfd6a5addd2602c7b12ba8c00c/core/src/main/java/org/apache/druid/java/util/common/parsers/TimestampParser.java#L129
   
   # Repro
   
   Try to ingest
   
   ```json
   {"name":"V","time":2019102120}
   {"name":"D","time":2019102121}
   ```
   
   With format of `yyyyMMddHH`
   
   Here is a ready made query:
   
   ```json
   {
     "type": "index",
     "spec": {
       "type": "index",
       "ioConfig": {
         "type": "index",
         "firehose": {
           "type": "inline",
           "data": 
"{\"name\":\"V\",\"time\":2019102120}\n{\"name\":\"D\",\"time\":2019102121}"
         }
       },
       "dataSchema": {
         "dataSource": "sample",
         "parser": {
           "type": "string",
           "parseSpec": {
             "format": "json",
             "timestampSpec": {
               "column": "time",
               "format": "yyyyMMddHH"
             },
             "dimensionsSpec": {}
           }
         }
       }
     },
     "samplerConfig": {
       "numRows": 500,
       "timeoutMs": 15000,
       "cacheKey": "8451100357df41f5ac5502381506e674"
     }
   }
   ```
   
   Results in:
   
   ```json
   {
     "cacheKey": "8451100357df41f5ac5502381506e674",
     "numRowsRead": 2,
     "numRowsIndexed": 2,
     "data": [
       {
         "raw": "{\"name\":\"V\",\"time\":2019102120}",
         "parsed": {
           "__time": 2019102120,
           "name": "V"
         }
       },
       {
         "raw": "{\"name\":\"D\",\"time\":2019102121}",
         "parsed": {
           "__time": 2019102121,
           "name": "D"
         }
       }
     ]
   }
   ```
   
   The format is totally ignored. If you send it as a string then all is good.
   
   # Impact
   
   This is strange and confusing behavior. One would intuitively expect that 
setting a JODA time format would cast the value to a string.
   
   I was confused by it as was this ASF Slack user:
   
   
![image](https://user-images.githubusercontent.com/177816/67253535-e76e1380-f42c-11e9-9ecc-ed2af06cc134.png)
   
   # Workaround
   
   You could either change your data to represent these columns as strings or 
add the timestamp in the transform stage using 
`timestamp_parse("time_column_to_parse", 'yyyyMMddHH')`
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to