dianaarnos opened a new issue #7758:
URL: https://github.com/apache/pinot/issues/7758


   Hi there,
   
   ### What am I trying to do? 
   (in case I'm taking the wrong approach here and someone can point me to a 
better solution)
   
   I need to select all docs by month and year.
   
   Currently I'm ingesting data from a kafka topic into Pinot and here's an 
example for a message value:
   ```json
   {
     "some_uuid": "b17e63ac-0766-4d54-a23d-6929e15d13b8",
     "product_name": "my product",
     "condition": true,
     "operation": "create",
     "operationDate": "2021-05-20T12:55:54.000+0000"
   }
   ```
   
   As you can see, the `operationDate` field follows the RFC3339 format.
   
   ### What is happening
   First, I tried the following schema definition for `dateTimeFieldSpecs`:
   ```json
   "dateTimeFieldSpecs": [
       {
         "name": "operationDate",
         "dataType": "STRING",
         "format": 
"1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSS'Z'",
         "granularity": "1:MILLISECONDS"
       }
     ]
   ```
   
   Then from the Controller UI, I tried running the following query,:
   ```sql
   select DATETIMECONVERT(operationDate, 
"1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-ddTHH:mm:ss.SSSZ", 
"1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM", "30:DAYS"), count(*) from my_table
   group by DATETIMECONVERT(operationDate, 
"1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-ddTHH:mm:ss.SSSZ", 
"1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM", "30:DAYS")
   ```
   And I got zero results, but the table has 20 docs that were successfully 
ingested. 
   
   I know that for months and years I should use the `DATETRUNC` function, so I 
tried the following query:
   ```sql
   select DATETRUNC('month', operationDate, 'MILLISECONDS'), count(*) from 
my_table
   group by DATETRUNC('month', operationDate, 'MILLISECONDS')
   ```
   And I get the following error on the UI:
   ```json
   [
     {
       "message": "QueryExecutionError:\nProcessingException(errorCode:450, 
message:InternalError:\njava.lang.NullPointerException\n\tat 
org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.mergeResults(GroupByOrderByCombineOperator.java:198)\n\tat
 
org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:117)\n\tat
 
org.apache.pinot.core.operator.combine.BaseCombineOperator.getNextBlock(BaseCombineOperator.java:49)\n\tat
 
org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49)\n\tat
 
org.apache.pinot.core.operator.InstanceResponseOperator.getNextBlock(InstanceResponseOperator.java:40)\n\tat
 
org.apache.pinot.core.operator.InstanceResponseOperator.getNextBlock(InstanceResponseOperator.java:28)\n\tat
 
org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49)\n\tat
 
org.apache.pinot.core.plan.GlobalPlanImplV0.execute(GlobalPlanImplV0.java:48)\n\tat
 org.apache.pinot.core.query.executor.ServerQueryExec
 utorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:296)\n\tat 
org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:216)\n\tat
 
org.apache.pinot.core.query.executor.QueryExecutor.processQuery(QueryExecutor.java:60)\n\tat
 
org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:155)\n\tat
 
org.apache.pinot.core.query.scheduler.QueryScheduler.lambda$createQueryFutureTask$0(QueryScheduler.java:139)",
       "errorCode": 200
     }
   ]
   ```
   And the following error inside pinot server:
   ```
   pinot-server_1        | 2021/11/12 16:46:08.659 ERROR 
[GroupByOrderByCombineOperator] [pqw-6] Caught exception while processing and 
combining group-by order-by for index: 1, operator: 
org.apache.pinot.core.operator.query.AggregationGroupByOrderByOperator, 
queryContext: QueryContext{_tableName='my_table_REALTIME', 
_selectExpressions=[datetrunc('month',operationDate,'MILLISECONDS'), count(*)], 
_aliasList=[null, null], _filter=null, 
_groupByExpressions=[datetrunc('month',operationDate,'MILLISECONDS')], 
_havingFilter=null, _orderByExpressions=null, _limit=10, _offset=0, 
_queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9997}, 
_debugOptions=null, 
_brokerRequest=BrokerRequest(querySource:QuerySource(tableName:my_table_REALTIME),
 pinotQuery:PinotQuery(dataSource:DataSource(tableName:my_table_REALTIME), 
selectList:[Expression(type:FUNCTION, functionCall:Function(operator:DATETRUNC, 
operands:[Expression(type:LITERAL, literal:<Literal stringValue:month>), 
Expression(type:IDENTIF
 IER, identifier:Identifier(name:operationDate)), Expression(type:LITERAL, 
literal:<Literal stringValue:MILLISECONDS>)])), Expression(type:FUNCTION, 
functionCall:Function(operator:COUNT, operands:[Expression(type:IDENTIFIER, 
identifier:Identifier(name:*))]))], groupByList:[Expression(type:FUNCTION, 
functionCall:Function(operator:DATETRUNC, operands:[Expression(type:LITERAL, 
literal:<Literal stringValue:month>), Expression(type:IDENTIFIER, 
identifier:Identifier(name:operationDate)), Expression(type:LITERAL, 
literal:<Literal stringValue:MILLISECONDS>)]))], 
queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=9997}))}
   pinot-server_1        | java.lang.NumberFormatException: For input string: 
"2021-05-20T12:55:54.000+0000"
   pinot-server_1        |      at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
~[?:?]
   pinot-server_1        |      at java.lang.Long.parseLong(Long.java:692) 
~[?:?]
   pinot-server_1        |      at java.lang.Long.parseLong(Long.java:817) 
~[?:?]
   pinot-server_1        |      at 
org.apache.pinot.segment.local.realtime.impl.dictionary.StringOnHeapMutableDictionary.getLongValue(StringOnHeapMutableDictionary.java:134)
 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.segment.spi.index.reader.Dictionary.readLongValues(Dictionary.java:158)
 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readLongValues(DataFetcher.java:325)
 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.common.DataFetcher.fetchLongValues(DataFetcher.java:112) 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.common.DataBlockCache.getLongValuesForSVColumn(DataBlockCache.java:137)
 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getLongValuesSV(ProjectionBlockValSet.java:79)
 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.operator.transform.function.IdentifierTransformFunction.transformToLongValuesSV(IdentifierTransformFunction.java:84)
 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.operator.transform.function.DateTruncTransformFunction.transformToLongValuesSV(DateTruncTransformFunction.java:134)
 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.operator.docvalsets.TransformBlockValSet.getLongValuesSV(TransformBlockValSet.java:75)
 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.query.aggregation.groupby.NoDictionarySingleColumnGroupKeyGenerator.generateKeysForBlock(NoDictionarySingleColumnGroupKeyGenerator.java:82)
 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.query.aggregation.groupby.DefaultGroupByExecutor.process(DefaultGroupByExecutor.java:120)
 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.operator.query.AggregationGroupByOrderByOperator.getNextBlock(AggregationGroupByOrderByOperator.java:120)
 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.operator.query.AggregationGroupByOrderByOperator.getNextBlock(AggregationGroupByOrderByOperator.java:47)
 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49) 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.operator.combine.GroupByOrderByCombineOperator.processSegments(GroupByOrderByCombineOperator.java:106)
 
[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.operator.combine.BaseCombineOperator$1.runJob(BaseCombineOperator.java:105)
 
[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.util.trace.TraceRunnable.run(TraceRunnable.java:40) 
[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
   pinot-server_1        |      at 
java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
   pinot-server_1        |      at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
   pinot-server_1        |      at 
shaded.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
 
[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
shaded.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
 
[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
shaded.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
 
[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
   pinot-server_1        |      at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
   pinot-server_1        |      at java.lang.Thread.run(Thread.java:829) [?:?]
   ```
   
   I tried a couple of other ways to use `DATETRUNC` but had the same problem.
   
   So I tried another config for the schema, using the `timestamp`data type, as 
the following:
   ```json
   "dateTimeFieldSpecs": [
       {
         "name": "operationDate",
         "dataType": "TIMESTAMP",
         "format": 
"1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSS'Z'",
         "granularity": "1:MILLISECONDS"
       }
     ]
   ```
   (obs.: the `TIMESTAMP` dataType is not listed in [the official 
docs](https://docs.pinot.apache.org/configuration-reference/schema#datetimefieldspec),
 I had to read the code and release details do find that it was existed)
   
   Then I got the following error inside pinot server:
   ```
   pinot-server_1        | 2021/11/12 16:15:58.933 ERROR 
[LLRealtimeSegmentDataManager_my_table__3__0__20211112T1615Z] 
[my_table__3__0__20211112T1615Z] Caught exception while transforming the 
record: {
   pinot-server_1        |   "fieldToValueMap" : {
   pinot-server_1        |     "some_uuid" : 
"fb13e0f6-a724-4876-9d4d-902ca94c2051",
   pinot-server_1        |     "operationDate" : "2020-07-20T12:55:54.000+0000",
   pinot-server_1        |     "condition" : "true",
   pinot-server_1        |     "product_name" : "My Product",
   pinot-server_1        |     "operation" : "delete"
   pinot-server_1        |   },
   pinot-server_1        |   "nullValueFields" : [ ]
   pinot-server_1        | }
   pinot-server_1        | java.lang.RuntimeException: Caught exception while 
transforming data type for column: operationDate
   pinot-server_1        |      at 
org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.transform(DataTypeTransformer.java:120)
 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.segment.local.recordtransformer.CompositeTransformer.transform(CompositeTransformer.java:82)
 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.processStreamEvents(LLRealtimeSegmentDataManager.java:510)
 
[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.consumeLoop(LLRealtimeSegmentDataManager.java:417)
 
[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:560)
 
[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at java.lang.Thread.run(Thread.java:829) [?:?]
   pinot-server_1        | Caused by: java.lang.IllegalArgumentException: 
Invalid timestamp: '2020-07-20T12:55:54.000+0000'
   pinot-server_1        |      at 
org.apache.pinot.spi.utils.TimestampUtils.toTimestamp(TimestampUtils.java:41) 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.common.utils.PinotDataType$10.toTimestamp(PinotDataType.java:524)
 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.common.utils.PinotDataType$9.convert(PinotDataType.java:485) 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.common.utils.PinotDataType$9.convert(PinotDataType.java:442) 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      at 
org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.transform(DataTypeTransformer.java:114)
 
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
   pinot-server_1        |      ... 5 more
   ```
   
   I noticed 2 open issues related to the RFC3339 date format, but they didn't 
help me:
   https://github.com/apache/pinot/issues/7195
   https://github.com/apache/pinot/issues/7276
   
   What am I missing here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to