Apache Pinot Daily Email Digest (2021-12-02)

Pinot Slack Email Digest Thu, 02 Dec 2021 18:00:31 -0800

#general

@maarten.dubois: @maarten.dubois has joined the channel
@zsolt: Is there an SQL equivalent of TOP from PQL?
@mayanks: Limit + order by
@zsolt: I thought it works different with multi-column group by
@zsolt: I'm looking for a way to limit group by groups without limiting the overall result set
@mayanks: You mean multiple aggregations? If so, with SQL, the ordering has to be explicit.
@lipicsbarna: @lipicsbarna has joined the channel
@tiger: is there any documentation on the LASTWITHTIME aggregation function? I'm trying to use it right now but it seems that it only supports time columns that are LONG/INT?
@mark.needham: yes, it only supports timestamps/epochs at the moment. I'll add some documentation on it, thanks for the reminder!
@tiger: thanks!

#random

@maarten.dubois: @maarten.dubois has joined the channel
@lipicsbarna: @lipicsbarna has joined the channel

#troubleshooting

@mapshen: When query Pinot via Trino (362), the avg() function doesn’t seem to work correctly. It always returns no data…
@mark.needham: Do you see any error anywhere in the logs? `logs/pinot-all.log` on each of the components. Might be worth creating a GitHub issue if you have an easy way that someone else can reproduce/debug -
@mapshen: Opened last night
@mapshen: Seems to be an issue with the pinot connector in Trino
@mapshen: @elon.azoulay FYI
@mark.needham: ah cool, ok
@elon.azoulay: Question @mapshen - can you paste the table schema? I wanted to see if there are multiple columns which are the same when case is ignored
@mapshen: No there isn’t. Very positive about it
@mapshen: other aggregates like max, min, count work
@elon.azoulay: Does the column which does not work contain uppercase characters?
@mapshen: seems this is what is missing in the implementation of avg:
@mapshen: in pinot, all columns are uppercase
@elon.azoulay: omg - I have that locally (we use trino connector in prod) but I need to put that into one of the open pr's I have :facepalm: I will do this tonight, latest tomorrow. Thanks for catching this @mapshen, apologies for the issue.
@elon.azoulay: There are a few other fixes in the 2 open pr's I have as well, will get them to you - hopefully will be in the latest release (trino 365)
@elon.azoulay: Trying to push everything I have so that open source == what we have.
@elon.azoulay: As soon as I update it I'll let you know
@mapshen: Thanks so much @elon.azoulay!
@syedakram93: is it possible to create segment file name with date in the filename, instead of time in millis(long)... for eg.,testtable_OFFLINE_1637625600000_1637712000000_1469.tar.gz to testtable_OFFLINE_2021-11-01_2021-11-05_1.tar.gz
@mark.needham: The segment name is derived from the data values in the date time column, so the only way (at least right now) to have the segment files named like that would be if you had a datetime column that stored the timestamp in a string format. I guess in theory you could create an unused column that has the date in that format, but I'd have to check how the segment file name generator works out which column to use as part of its name.
@yeongjukang: Hello folks, I am struggling with hybrid table but i have trouble to make it work. My configs are like below. Offline table data is only displayed without streaming data blended. I could find log of consuming kafka events but couldn’t find broker, controller or server error. My hybrid table creation testing is running on minikube, with pinot 0.9.0. Process I did was creating offline, and then realtime. • bin/pinot-admin.sh AddTable -tableConfigFile hybrid_realtime.json -schemaFile hybrid_schema.json -exec • bin/pinot-admin.sh AddTable -tableConfigFile hybrid_offline.json -schemaFile hybrid_schema.json -exec 1. hybrid_schema.json ```{ "schemaName": "transcript", "dimensionFieldSpecs": [ { "name": "studentID", "dataType": "INT" }, { "name": "firstName", "dataType": "STRING" }, { "name": "lastName", "dataType": "STRING" }, { "name": "gender", "dataType": "STRING" }, { "name": "subject", "dataType": "STRING" }, { "name": "doNotFailPlease", "dataType": "STRING", "defaultNullValue": "" }, { "name": "ts2", "dataType": "TIMESTAMP" } ], "metricFieldSpecs": [ { "name": "score", "dataType": "FLOAT" } ], "dateTimeFieldSpecs": [ { "name": "ts", "dataType": "TIMESTAMP", "format": "1:SECONDS:EPOCH", "granularity": "1:SECONDS" } ], "primaryKeyColumns": [ "studentID" ] }``` 2. hybrid_offline.json ```{ "tableName": "transcript_hybrid", "tableType": "OFFLINE", "segmentsConfig": { "replication": 1, "timeColumnName": "ts", "timeType": "SECONDS" }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant" }, "tableIndexConfig": { "loadMode": "MMAP" }, "metadata": {}``` 3. hybrid_realtime.json ```{ "tableName": "transcript_hybrid", "tableType": "REALTIME", "segmentsConfig": { "timeColumnName": "ts", "timeType": "SECONDS", "schemaName": "transcript", "replicasPerPartition": "1" }, "tenants": {}, "tableIndexConfig": { "loadMode": "MMAP", "streamConfigs": { "streamType": "kafka", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.topic.name": "transcript", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.broker.list": "kafka.local-pinot.svc.cluster.local:9092", "realtime.segment.flush.threshold.time": "6h" } }, "metadata": { "customConfigs": {} }, "routing": { "instanceSelectorType": "strictReplicaGroup" }, "upsertConfig": { "mode": "FULL" } }```
@mark.needham: do you see any segments are created for the realtime table? You can navigate to the tables/segments via this screen -
@mark.needham: The full logs of each component are at `logs/pinot-all.log` - did you already check for entries in there?
@yeongjukang: it seems like ‘logs/pinot-all.log’ this one got freezed. so i am debugging with stdout from the only server, server-0 a segment is created and stdout show it as consuming but data doesn’t display.
@yeongjukang: @mark.needham should i wait until initial flush?
@mark.needham: I thought you should be able to query while it's consuming, but let's check with @mayanks to be certain. In the meantime you could experiment with the threshold time if it's not too hard to change. I think right now it's set to `6h` - maybe reduce that to a smaller value?
@mayanks: Events are ready to be served as soon as they are ingested, segment does not need to be flushed for that.
@mayanks: I am not sure if I follow the original issue, is it that query is returning rows only from offline table?
@mayanks: If logs are missing, then it is log4j setting issue.
@mayanks: If query is returning offline data but not real-time then your time boundary is incorrect (check time value and units). You can check min time in real-time and max time in offline. This could happen if offline has more recent data than real-time (usually due to time unit issue).
@yeongjukang: @mayanks For data return, yes only from offline table
@yeongjukang: @mayanks I didn’t check the source, but I think its flushing size could make it look like freezed. (red ellipse part)
@mayanks: That might just be logging issue. Can you run query on individual tables (add _REALTIME) to table name in the query and see if you get any data
@mayanks: If yes, then most definitely it is time boundary issue
@yeongjukang: @mayanks Following your guide, I tried these cases with but realtime data does not appear. below is type, _granularity pair._ • TIMESTAMP / ms • TIMESTAMP / s • LONG / ms • LONG /s
@mayanks: Can you run select count(*) from transcript_hybrid_REALTIME?
@yeongjukang: Oh.. I got this one
@yeongjukang: and I can see log from INSTANCES/broker-headless/ERRORS at zk UI
@yeongjukang: let me summarize
@yeongjukang: ```"HELIX_ERROR 20211202-132039.000850 STATE_TRANSITION 8c6c680c-f89a-450f-b9c9-a4464d517617": { "AdditionalInfo": "Exception while executing a state transition task transcript_hybrid_REALTIMEjava.lang.reflect.InvocationTargetException\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404)\n\tat org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331)\n\tat org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97)\n\tat org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: java.lang.IllegalStateException: Failed to find schema for table: transcript_hybrid_OFFLINE\n\tat shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:518)\n\tat org.apache.pinot.broker.routing.timeboundary.TimeBoundaryManager.<init>(TimeBoundaryManager.java:74)\n\tat org.apache.pinot.broker.routing.RoutingManager.buildRouting(RoutingManager.java:371)\n\tat org.apache.pinot.broker.broker.helix.BrokerResourceOnlineOfflineStateModelFactory$BrokerResourceOnlineOfflineStateModel.onBecomeOnlineFromOffline(BrokerResourceOnlineOfflineStateModelFactory.java:80)\n\t... 12 more\n", "Class": "class org.apache.helix.messaging.handling.HelixStateTransitionHandler", "MSG_ID": "5173dca5-b009-46cd-b6be-15e57b3a5b01", "Message state": "READ" }```
@yeongjukang: ```"HELIX_ERROR 20211202-132039.000904 STATE_TRANSITION e7d869d7-0a91-4067-990d-e954ef5228ef": { "AdditionalInfo": "Message execution failed. msgId: 5173dca5-b009-46cd-b6be-15e57b3a5b01, errorMsg: java.lang.reflect.InvocationTargetException", "Class": "class org.apache.helix.messaging.handling.HelixStateTransitionHandler", "MSG_ID": "5173dca5-b009-46cd-b6be-15e57b3a5b01", "Message state": "READ" },```
@yeongjukang: @mayanks these 2 repeat
@mark.needham: @mayanks it seems like that line of code assumes that the name of the schema is the same as the table name, which in this case it isn't
@mark.needham: @yeongjukang can you try changing the name of your schema to be `transcript_hybrid` to test out that hypothesis?
@yeongjukang: @mark.needham @mayanks Thank you all! It works!
@yeongjukang: @mark.needham By the way, is there any plan to allow different name of table and schema?
@mark.needham: I think it should allow that - seems like there's a bug. I've created an issue for it -
@mark.needham: hope that's ok to include the config you shared?
@mark.needham: lemme know if not and I'll update
@yeongjukang: @mark.needham my pleasure, no worries.
@yeongjukang: @mark.needham that was from pinot official doc:grin: by the way, what is recommended method to export realtime table's sealed segment as csv, json or avro? I want to extract and upload to s3. After loading, dropping will be my next. Will it be able to be done with swagger api call?
@mark.needham: I'm not sure how to convert a segment into those formats, but Pinot has the concept of a 'deep store' which keeps compressed copies of segment files in various file formats, of which one is S3. See and Would that do what you want?
@yeongjukang: @mark.needham I use ingestion job to import data from s3 like blue part, but couldn’t find red part feature, realtime to s3.
@yeongjukang:
@maarten.dubois: @maarten.dubois has joined the channel
@deemish2: Hello everyone , I am working backfill data using spark batch ingestion , can we handle duplicate data while backfill , so that it won’t get duplicated in OFFLINE table
@mark.needham: Can you explain a bit more what you mean by duplication? Do you mean duplication between data that is already loaded into a realtime table and data that you're gonna backfill into an offline table?
@mayanks: If you push a segment with name that already exists in Pinot, then the existing segment is overwritten. As long as your daily and backfill jobs produce consistent segment names for a given time period, data will be overwritten.
@lipicsbarna: @lipicsbarna has joined the channel
@mapshen: Could it be that although the `vliadDocsIndex` is up-to-date, the latest row hasn’t been added?
@elon.azoulay: Question about replica groups and pools for a realtime table: If we set the replicas per partition to 1 in the segment config, and in the instance config set numReplicaGroups to 1 but have 3 pools, do the segments in the table end up having 3 replicas? i.e. 1 per pool?
@elon.azoulay: ex. tableconfig: ```"segmentsConfig": { ... "segmentPushFrequency": "daily", "segmentPushType": "APPEND", "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy", "replicasPerPartition": "3" }, ... "instanceAssignmentConfigMap": { "CONSUMING": { "tagPoolConfig": { "tag": "StatusCentral_REALTIME", "poolBased": true, "numPools": 3 }, "replicaGroupPartitionConfig": { "replicaGroupBased": true, "numReplicaGroups": 1, "numInstancesPerPartition": 0, "numPartitions": 1 } }, "COMPLETED": { "tagPoolConfig": { "tag": "StatusCentral_REALTIME", "poolBased": true, "numPools": 3 }, "replicaGroupPartitionConfig": { "replicaGroupBased": true, "numReplicaGroups": 1, "numInstancesPerPartition": 0, "numPartitions": 1 } } }```
@elon.azoulay: Ah, it doesn't look like it
@elon.azoulay: So if I have replicas per partition set to 3, replica groups set to 3 and numPools set to 3 it seems to have 1 replica per pool. Does that sound correct?
@elon.azoulay: we also have only 1 instance per pool - just testing this out in staging now.
@bagi.priyank: hello, i see these messages in logs a lot ```2021/12/02 21:32:12.132 ERROR [SegmentBuildTimeLeaseExtender] [pool-4-thread-1] Failed to send lease extension for km_mp_play_startree__63__21__20211202T2127Z 2021/12/02 21:32:18.330 ERROR [SegmentBuildTimeLeaseExtender] [pool-4-thread-1] Failed to send lease extension for km_mp_play_startree__103__21__20211202T2127Z 2021/12/02 21:32:24.354 ERROR [SegmentBuildTimeLeaseExtender] [pool-4-thread-1] Failed to send lease extension for km_mp_play_startree__111__21__20211202T2127Z``` and i see that the server is marked as dead in the cluster manager. how can i get around this? thanks in advance.
@mayanks: Try to restart the server?
@bagi.priyank: that is pretty much what i have been doing so far. but it happens quite often and i was hoping there might be some config that i could tweak to reduce the frequency of this.
@mayanks: Hmm, do you have the log around where it dies? Is it OOM?
@bagi.priyank: doesn't really say much. i didn't have all the logs. let me try and fetch those. worst case scenario i'll fetch them as it happens again.
@bagi.priyank: sorry to mis-report. the pinot servers with this log are not the ones as reported dead in cluster manager.
@bagi.priyank: this is another thing i am seeing in logs ```2021/12/02 21:34:27.105 ERROR [LLRealtimeSegmentDataManager_km_mp_play_startree__48__1__20211202T1919Z] [km_mp_play_startree__48__1__20211202T1919Z] Holding after response from Controller: {"offset":-1,"streamPartitionMsgOffset":null,"buildTimeSec":-1,"isSplitCommitType":false,"status":"NOT_SENT"} Killed``` i don't have the part before this right now but can update once it happens again.
@mayanks: How are you running, seems like some external process is killing it (may be due to resource usage violation)?
@mayanks: If so, you probably need to increase resources.
@mayanks: Servers are not expected to die because of something that Pinot code did.
@bagi.priyank: i am basically running the manual launches commands on multiple instances to run broker, server and controller. i have 8 servers, 3 brokers and 3 controller instances. each instance on a r4.xlarge ec2 instance ( 4 cores, 30.5 gb ram and 1024 gb ssd disk)
@bagi.priyank: ```[pbagrecha@pinot-startree-server-dev-1003 apache-pinot-0.9.0-bin]$ bin/pinot-admin.sh StartServer -zkAddress 10.220.10.122:2191 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/pbagrecha/apache-pinot-0.9.0-bin/lib/pinot-all-0.9.0-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/pbagrecha/apache-pinot-0.9.0-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.9.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/pbagrecha/apache-pinot-0.9.0-bin/plugins/pinot-environment/pinot-azure/pinot-azure-0.9.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/pbagrecha/apache-pinot-0.9.0-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.9.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/pbagrecha/apache-pinot-0.9.0-bin/plugins/pinot-metrics/pinot-yammer/pinot-yammer-0.9.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/pbagrecha/apache-pinot-0.9.0-bin/plugins/pinot-metrics/pinot-dropwizard/pinot-dropwizard-0.9.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.codehaus.groovy.reflection.CachedClass (file:/home/pbagrecha/apache-pinot-0.9.0-bin/lib/pinot-all-0.9.0-jar-with-dependencies.jar) to method java.lang.Object.finalize() WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.reflection.CachedClass WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 2021/12/02 21:33:52.382 INFO [StartServerCommand] [main] Executing command: StartServer -clusterName PinotCluster -serverHost null -serverPort 8098 -serverAdminPort 8097 -dataDir /tmp/data/pinotServerData -segmentDir /tmp/data/pinotSegments -zkAddress 10.220.10.122:2191 2021/12/02 21:33:52.415 INFO [StartServiceManagerCommand] [main] Executing command: StartServiceManager -clusterName PinotCluster -zkAddress 10.220.10.122:2191 -port -1 -bootstrapServices [] 2021/12/02 21:33:52.416 INFO [StartServiceManagerCommand] [main] Starting a Pinot [SERVICE_MANAGER] at 1.032s since launch 2021/12/02 21:33:52.422 INFO [StartServiceManagerCommand] [main] Started Pinot [SERVICE_MANAGER] instance [ServiceManager_pinot-startree-server-dev-1003.localdomain_-1] at 1.037s since launch 2021/12/02 21:33:52.425 INFO [StartServiceManagerCommand] [Start a Pinot [SERVER]] Starting a Pinot [SERVER] at 1.04s since launch Dec 02, 2021 9:34:57 PM org.glassfish.grizzly.http.server.NetworkListener start INFO: Started listener bound to [0.0.0.0:8097] Dec 02, 2021 9:34:57 PM org.glassfish.grizzly.http.server.HttpServer start INFO: [HttpServer] Started. 2021/12/02 21:36:27.182 INFO [StartServiceManagerCommand] [Start a Pinot [SERVER]] Started Pinot [SERVER] instance [Server_10.220.5.143_8098] at 155.798s since launch Killed```
@bagi.priyank: ```[pbagrecha@pinot-startree-server-dev-1003 apache-pinot-0.9.0-bin]$ echo $JAVA_OPTS -Xms4G -Xmx16G -Dpinot.admin.system.exit=false -Dpinot.server.instance.realtime.alloc.offheap=true```
@bagi.priyank: seems like both cpu and memory usage are going up. is this something that can be mitigated by increasing the number of servers? or needs a bigger server? or both?

#pinot-dev

@chinmay.cerebro: @chinmay.cerebro has joined the channel

#announcements

@maarten.dubois: @maarten.dubois has joined the channel

#thirdeye-pinot

@maarten.dubois: @maarten.dubois has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org