Apache Pinot Daily Email Digest (2021-10-13)

Pinot Slack Email Digest Wed, 13 Oct 2021 19:00:39 -0700

#general

@nhas3007: @nhas3007 has joined the channel
@nemanja: @nemanja has joined the channel
@lalitbhagtani01: @lalitbhagtani01 has joined the channel
@lalitbhagtani01: Hi all, I have one question regarding failing of a server node. So here is the situation, my pinot cluster is ingesting data from kafka, and before my segment is completed that server hosting consuming segment is died. Now when my new server will be up, will it start consuming these lost records from kafka again or not, and if yes how it will know that it has to consume from this index from this partition. Thanks
@mayanks: Yes, it will consume from the last checkpoint that Pinot saved, so no data loss
@lalitbhagtani01: Thanks for quick response. So I only have to make sure my kafka save this data long enough. Thanks
@courage.noko: @courage.noko has joined the channel
@singhal.prateek3: @singhal.prateek3 has joined the channel
@benshahbaz: @benshahbaz has joined the channel
@singhal.prateek3: Hi Folks, I have a couple of questions regarding Star-Tree index: 1. In the image attached, is it possible for me to get D1-V1 and D1-Star as results of the same query? The assumption here is that since nodes in star-tree index are pre-aggregated, can I somehow pull two of them out in one-go. (I guess subqueries with 2 different filter conditions would be one solution, but Pinot does not support that) 2. In the realtime table, is it possible to use star-tree index? My understanding is that since star-tree index requires pre-aggregation, it may not be applicable to real-time tables. If that’s the case, is it possible to activate star-tree index without upserts?
@mayanks: 1. What do you mean by pull out the nodes in one-go? 2. You can configure star-tree index for realtime, but yes upsert isn't supported with that. cc: @jackie.jxt
@singhal.prateek3: @mayanks … By pull-out, I basically mean to get D1-V1 and D1-Star as results of the same query
@mayanks: The start node is mutually exclusive with other nodes, so the answer to 1 is no.
@singhal.prateek3: Thanks for the reply!
@g.kishore: Its better to run two queries since they will end up using different nodes in the tree anyway

#random

@nhas3007: @nhas3007 has joined the channel
@nemanja: @nemanja has joined the channel
@lalitbhagtani01: @lalitbhagtani01 has joined the channel
@courage.noko: @courage.noko has joined the channel
@singhal.prateek3: @singhal.prateek3 has joined the channel
@benshahbaz: @benshahbaz has joined the channel
@benshahbaz: @benshahbaz has left the channel

#troubleshooting

@dunithd: Folks, I have a sample data set like this: `"9/1/2014 6:04:00",40.7513,-73.935,"B02512"` `"9/1/2014 6:08:00",40.7291,-73.9813,"B02512"` `"9/1/2014 6:14:00",40.7674,-73.9841,"B02512"` Time is in minute granularity throughout the data set. So I mapped the time column like this in my schema file: `"dateTimeFieldSpecs": [{` `"name": "pickupTime",` `"dataType": "STRING",` `"format" : "1:MINUTES:SIMPLE_DATE_FORMAT:MM/dd/yyyy HH:mm:ss",` `"granularity": "1:MINUTES"` `}` And then in the table configuration: `"segmentsConfig" : {` `"timeColumnName": "pickupTime",` `"timeType": "MINUTES",` `"replication" : "1",` `"schemaName" : "pickups"` `},` Hope this is fine?
@dunithd: Then my ingestion job failed with this: `Failed to generate Pinot segment for file - file:/Users/dunith/Projects/streamlit/rawdata/uber-raw-data-sep14.csv` `java.lang.IllegalArgumentException: Invalid format: "null"` `at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]` `at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]` `at org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.writeMetadata(SegmentColumnarIndexCreator.java:552) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]` `at org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.seal(SegmentColumnarIndexCreator.java:512) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]` `at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:284) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]` `at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:257) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]` `at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:111) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]` `at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:263) ~[pinot-batch-ingestion-standalone-0.8.0-shaded.jar:0.8.0-9a0f41bc24243ff74315723b0153b534c2596e30]` `at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]` `at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]` `at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]` `at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]` `at java.lang.Thread.run(Thread.java:834) [?:?]`
@dunithd: I can see the schema and table created in the data explorer. But not sure what went wrong. I guess something to do with the time formatting?
@npawar: Could there be a null value for time column in some row? Time column doesn't allow that
@npawar: Also you might want to change HH to just H in your pattern string, since your values can contain single or double digit in hour
@dunithd: I will check the time column for a null value. Also, will do the HH -> H change and see whether the issue persists. Thanks.
@nhas3007: @nhas3007 has joined the channel
@msoni6226: Hi Team, We are running a hybrid table setup in our Pinot cluster. We have configured task to move data from RealTimeToOffline table. However, we are not seeing any data being moved from Realtime to Offline table. On checking the controller logs, I see the below errors. ```2021-10-13 07:41:57.360 ERROR [ZkBaseDataAccessor] [grizzly-http-server-4] paths is null or empty 2021-10-13 07:41:58.956 ERROR [ZkBaseDataAccessor] [grizzly-http-server-17] paths is null or empty 2021-10-13 00:06:19.325 ERROR [JobDispatcher] [HelixController-pipeline-task-PinotCluster-(275fe39b_TASK)] Job configuration is NULL for TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1633995887529```
@nemanja: @nemanja has joined the channel
@lalitbhagtani01: @lalitbhagtani01 has joined the channel
@courage.noko: @courage.noko has joined the channel
@singhal.prateek3: @singhal.prateek3 has joined the channel
@benshahbaz: @benshahbaz has joined the channel

#docs

@benshahbaz: @benshahbaz has joined the channel

#pinot-dev

@tharun.3c: @tharun.3c has joined the channel
@benshahbaz: @benshahbaz has joined the channel

#announcements

@albertobeiz: @albertobeiz has joined the channel
@nemanja: @nemanja has joined the channel

#getting-started

@tharun.3c: @tharun.3c has joined the channel
@otiennosharon: @otiennosharon has joined the channel
@otiennosharon: Hello I am new in using Apache Pinot. I am trying to learn more about Pinot operators. Would anyone help me in getting to unserstand how it works and how to go about it?
@mayanks: Hi @otiennosharon this is a good starting point
@karinwolok1: Welcome, @otiennosharon! Happy to have you here. Let us know if that documentation is helpful and what else we can do to make your learning journey smoother. :slightly_smiling_face:
@otiennosharon: Thanks so much @mayanks and @karinwolok1 I believe I can start from here and when I encounter challenges I will definetly reach out
@nemanja: @nemanja has joined the channel
@courage.noko: @courage.noko has joined the channel
@courage.noko: hey, I deployed pinot on Kubernetes, is there a way to set Google Cloud Storage configs such as `pinot.controller.storage.factory.gs.projectId` on the server/controller during deployment or update these?
@g.kishore: Does this help -

#flink-pinot-connector

@singhal.prateek3: @singhal.prateek3 has joined the channel
@singhal.prateek3: @singhal.prateek3 has left the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org