Apache Pinot Daily Email Digest (2021-10-12)

Pinot Slack Email Digest Tue, 12 Oct 2021 19:00:37 -0700

#general

@surajkmth29: Hi Folks, We are trying to construct a tabular view from data in pinot. Eg: Get the list of top 10 userId's from Table A, get names of those users using lookup from Table B. Is this supported using lookup?
@g.kishore: Yes. Lookup joins are supported. In your case, it might be better to use sub query in the filter clause
@tharun.3c: @tharun.3c has joined the channel
@chxing: Hi Folks， I am using prestodb, but fond an issue when using order by , Could anyone help check?
@dadelcas: Can you add some logs? That may help to find the issue. See if adding --debug to your presto CLI the query prints a stack trace
@xiangfu0: Can you explain this query in presto and use right arrow to see the generated Pinot query?
@kennybastani: @chxing please try removing the semi-colon at the end of the query. I've found this to be an issue with `ORDER-BY` queries recently. Looking into it further.
@chxing: Hi @xiangfu0 `presto:default> explain select * from matsmeeting order by jobprocesstime desc;` ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- - Output[servicetype, hostuserid, confopentimets, modifiedcount, endtimewwpts, hasrecording, confopentime, confname, haschat, h264sharing, endtimewwp, isenablewebcast, state, closerea - RemoteStreamingMerge[jobprocesstime DESC_NULLS_LAST] => [servicetype:varchar, hostuserid:varchar, confopentimets:timestamp, modifiedcount:integer, endtimewwpts:timestamp, hasrec - LocalMerge[jobprocesstime DESC_NULLS_LAST] => [servicetype:varchar, hostuserid:varchar, confopentimets:timestamp, modifiedcount:integer, endtimewwpts:timestamp, hasrecording - PartialSort[jobprocesstime DESC_NULLS_LAST] => [servicetype:varchar, hostuserid:varchar, confopentimets:timestamp, modifiedcount:integer, endtimewwpts:timestamp, hasreco - RemoteStreamingExchange[REPARTITION] => [servicetype:varchar, hostuserid:varchar, confopentimets:timestamp, modifiedcount:integer, endtimewwpts:timestamp, hasrecordi Estimates: {rows: ? (?), cpu: ?, memory: 0.00, network: ?} - TableScan[TableHandle {connectorId=‘pinot’, connectorHandle=’PinotTableHandle{connectorId=pinot, schemaName=default, tableName=matsmeeting, isQueryShort=Optional Estimates: {rows: ? (?), cpu: ?, memory: 0.00, network: 0.00} attendeenum := PinotColumnHandle{columnName=attendeeNum, dataType=integer, type=REGULAR} siteid := PinotColumnHandle{columnName=siteId, dataType=bigint, type=REGULAR} ints := PinotColumnHandle{columnName=ints, dataType=timestamp, type=REGULAR} meetingtype := PinotColumnHandle{columnName=meetingType, dataType=varchar, type=REGULAR} confopentime := PinotColumnHandle{columnName=confOpenTime, dataType=varchar, type=REGULAR} reporttime := PinotColumnHandle{columnName=reportTime, dataType=varchar, type=REGULAR} webexdeviceidtype := PinotColumnHandle{columnName=webexDeviceIdType, dataType=varchar, type=REGULAR} hostuserid := PinotColumnHandle{columnName=hostUserId, dataType=varchar, type=REGULAR} hasdocumentsharing := PinotColumnHandle{columnName=hasDocumentSharing, dataType=varchar, type=REGULAR} userid := PinotColumnHandle{columnName=userId, dataType=bigint, type=REGULAR} confid := PinotColumnHandle{columnName=confId, dataType=bigint, type=REGULAR} jobprocessts := PinotColumnHandle{columnName=jobProcessTS, dataType=timestamp, type=REGULAR} hastpuser := PinotColumnHandle{columnName=hasTPUser, dataType=varchar, type=REGULAR} ismct := PinotColumnHandle{columnName=isMCT, dataType=varchar, type=REGULAR} endtimewwpts := PinotColumnHandle{columnName=endTimeWWPTS, dataType=timestamp, type=REGULAR} hasapplicationsharing := PinotColumnHandle{columnName=hasApplicationSharing, dataType=varchar, type=REGULAR} modifiedcount := PinotColumnHandle{columnName=modifiedCount, dataType=integer, type=REGULAR} featurename := PinotColumnHandle{columnName=featureName, dataType=varchar, type=REGULAR} closereason := PinotColumnHandle{columnName=closeReason, dataType=varchar, type=REGULAR} lts := PinotColumnHandle{columnName=lts, dataType=timestamp, type=REGULAR} statewwp := PinotColumnHandle{columnName=stateWWP, dataType=integer, type=REGULAR} endtimets := PinotColumnHandle{columnName=endTimeTS, dataType=timestamp, type=REGULAR} hostnodeid := PinotColumnHandle{columnName=hostNodeId, dataType=varchar, type=REGULAR} endtimewwp := PinotColumnHandle{columnName=endTimeWWP, dataType=varchar, type=REGULAR} hasrecording := PinotColumnHandle{columnName=hasRecording, dataType=varchar, type=REGULAR} numwbxusers := PinotColumnHandle{columnName=numWbxusers, dataType=integer, type=REGULAR} jobpoolname := PinotColumnHandle{columnName=jobPoolName, dataType=varchar, type=REGULAR} clusterid := PinotColumnHandle{columnName=clusterId, dataType=varchar, type=REGULAR} duration := PinotColumnHandle{columnName=duration, dataType=integer, type=REGULAR} haschat := PinotColumnHandle{columnName=hasChat, dataType=varchar, type=REGULAR} env := PinotColumnHandle{columnName=env, dataType=varchar, type=REGULAR} cmrversioncode := PinotColumnHandle{columnName=cmrVersionCode, dataType=varchar, type=REGULAR} isenablewebcast := PinotColumnHandle{columnName=IsEnableWebcast, dataType=varchar, type=REGULAR} jobprocesstime := PinotColumnHandle{columnName=jobProcessTime, dataType=varchar, type=REGULAR} lastmodifiedtime := PinotColumnHandle{columnName=lastModifiedTime, dataType=varchar, type=REGULAR} nbrenabled := PinotColumnHandle{columnName=NBREnabled, dataType=varchar, type=REGULAR} confname := PinotColumnHandle{columnName=confName, dataType=varchar, type=REGULAR} ts := PinotColumnHandle{columnName=ts, dataType=timestamp, type=REGULAR} starttimets := PinotColumnHandle{columnName=startTimeTS, dataType=timestamp, type=REGULAR} isspacemeeting := PinotColumnHandle{columnName=isSpaceMeeting, dataType=varchar, type=REGULAR} tpserver := PinotColumnHandle{columnName=tpServer, dataType=varchar, type=REGULAR} endtime := PinotColumnHandle{columnName=endTime, dataType=varchar, type=REGULAR} sitename := PinotColumnHandle{columnName=siteName, dataType=varchar, type=REGULAR} starttime := PinotColumnHandle{columnName=startTime, dataType=varchar, type=REGULAR} state := PinotColumnHandle{columnName=state, dataType=integer, type=REGULAR} confopentimets := PinotColumnHandle{columnName=confOpenTimeTS, dataType=timestamp, type=REGULAR} haspolling := PinotColumnHandle{columnName=hasPolling, dataType=varchar, type=REGULAR} componentaddress := PinotColumnHandle{columnName=componentAddress, dataType=varchar, type=REGULAR} meetingtypecode := PinotColumnHandle{columnName=meetingTypeCode, dataType=varchar, type=REGULAR} hasfilesharing := PinotColumnHandle{columnName=hasFileSharing, dataType=varchar, type=REGULAR} isenableevent := PinotColumnHandle{columnName=IsEnableEvent, dataType=varchar, type=REGULAR} numtpusers := PinotColumnHandle{columnName=numTpUsers, dataType=integer, type=REGULAR} h264sharing := PinotColumnHandle{columnName=H264Sharing, dataType=varchar, type=REGULAR} servicetype := PinotColumnHandle{columnName=serviceType, dataType=varchar, type=REGULAR} mconline := PinotColumnHandle{columnName=MCONLINE, dataType=varchar, type=REGULAR} mtgserver := PinotColumnHandle{columnName=mtgServer, dataType=varchar, type=REGULAR} meetingkey := PinotColumnHandle{columnName=meetingKey, dataType=varchar, type=REGULAR} supportwebexcloudproximity := PinotColumnHandle{columnName=supportWebexCloudProximity, dataType=varchar, type=REGULAR} closereasoncode := PinotColumnHandle{columnName=closeReasonCode, dataType=varchar, type=REGULAR} clientipaddr := PinotColumnHandle{columnName=clientIpAddr, dataType=varchar, type=REGULAR}
@chxing: Hi @dadelcas here is debug info `presto:default> select * from matsmeeting order by jobprocesstime desc;` Query 20211013_011448_00018_92gth, FAILED, 1 node Splits: 118 total, 0 done (0.00%) CPU Time: 0.0s total, 0 rows/s, 0B/s, 0% active Per Node: 0.0 parallelism, 0 rows/s, 0B/s Parallelism: 0.0 Peak Memory: 0B 0:00 [0 rows, 0B] [0 rows/s, 0B/s] Query 20211013_011448_00018_92gth failed: null value in entry: Server_sj1-pinot-server-34_8098=null java.lang.NullPointerException: null value in entry: Server_sj1-pinot-server-34_8098=null at com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:32) at com.google.common.collect.SingletonImmutableBiMap.<init>(SingletonImmutableBiMap.java:42) at com.google.common.collect.ImmutableBiMap.of(ImmutableBiMap.java:72) at com.google.common.collect.ImmutableMap.of(ImmutableMap.java:124) at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:458) at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:437) at com.facebook.presto.pinot.PinotSegmentPageSource.queryPinot(PinotSegmentPageSource.java:242) at com.facebook.presto.pinot.PinotSegmentPageSource.fetchPinotData(PinotSegmentPageSource.java:214) at com.facebook.presto.pinot.PinotSegmentPageSource.getNextPage(PinotSegmentPageSource.java:161) at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:252) at com.facebook.presto.operator.Driver.processInternal(Driver.java:418) at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301) at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722) at com.facebook.presto.operator.Driver.processFor(Driver.java:294) at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077) at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162) at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:599) at com.facebook.presto.$gen.Presto_0_261_SNAPSHOT_4175e47____20211008_012118_1.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
@chxing: @kennybastani I found if i remove semi-colon, i can’t run commend in cli
@chxing:
@kennybastani: Ah okay. There is a bug in 0.9.0 where a semi-colon at the end of a query in the browser does result in an error. But this looks very much related to Presto.
@chxing: Hi @kennybastani It’s strange that only failed when query contain order-by?
@kennybastani: The exception doesn’t seem to be happening in the Pinot runtime. This looks like a bug in Presto. I’m not very familiar with Presto’s code so I think someone else might be better able to help you here. I would, however, try running your query in the Pinot query console and see if it returns any results.
@chxing: Thx @kennybastani Do we have presto coder in this room? :grinning:
@chxing: seems select is ok, but failed in order by
@nkuptsov: @nkuptsov has joined the channel
@talgab: @talgab has joined the channel
@luisfernandez: hey friends, our team is trying to query pinot from the query console, and we are trying to understand some of the latency in the queries we are currently executing a query like the following : `select * from table where user_id = x` when we first hit a query like this we get more than 500ms after we hit it again we get good results i guess it’s because the segment gets closer to memory, i was wondering why something like this would happen 500ms is def out of our expectations for query latency, our current configuration of the table has indexing and it’s a real time table. our current config for noDictionaryColumns ``` "noDictionaryColumns": [ "click_count", "impression_count", ],``` so that we can aggregate in our dimensions using “aggregateMetrics” : true segment flushing config configurations: ```"realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.time": "24h", "realtime.segment.flush.segment.size": "250M"``` we have rangeIndex in our serve_time which is an epoch timestamp to the hour. we have an invertexindex on the user_id and sortedcolumn as well as a partition map with 4 partitions with modulo. we chose 4 partitions because the consuming topic has 4 partitions. the consuming topic is getting around 5k messages a second. finally we currently have 2 servers with 4gigs of heap for java and 10g in the machine itself 4 cpu and 500G of disk space. at the moment of writing this message we have 96 segments in this table. metrics from what we issue a query like the one seen above: ```timeUsedMs numDocsScanned totalDocs numServersQueried numServersResponded numSegmentsQueried numSegmentsProcessed numSegmentsMatched numConsumingSegmentsQueried numEntriesScannedInFilter numEntriesScannedPostFilter numGroupsLimitReached partialResponse minConsumingFreshnessTimeMs offlineThreadCpuTimeNs realtimeThreadCpuTimeNs 264 40 401325330 2 2 93 93 4 1 0 320 false - 1634050463550 0 159743463``` could anyone direct me into what to look into even this queries based on the trouble shooting steps don’t seem to have much numDocsScanned and numEntriesScannedPostFilter
@richard892: is this 500ms cold start consistent even when the JVMs have been running for a long time?
@luisfernandez: it’s
@luisfernandez: 3 digits latency
@richard892: I'm asking whether this is JVM warmup
@luisfernandez: i don’t think it’s once i grab a different user_id i get again 3 digits latency
@richard892: ok
@luisfernandez: pinot has been running for a while in our cluster if that’s what you are asking
@richard892: what's the cardinality of `user_id` ?
@richard892: if it's very high, there will be a lot of bitmaps in the inverted index on `user_id` which will incur some set up costs (what version are you using, by the way)? Which dimension are you sorting by? Would it be possible to sort by `user_id` so you can use a sorted index instead?
@luisfernandez: it’s on the hundreds of thousands? we usually sort by impression_count or click_count
@luisfernandez: so would sort by user_id, impression_count help at all?
@luisfernandez: i’m on 0.8.0
@richard892: the fastest index you can have is a sorted index, so it depends what the most important queries are. If this is a really important query, I would investigate sorting by `user_id` and in 0.9.0 the new range index should have much lower latency for `impression_count` and `click_count`
@richard892: high cardinalities _can_ cause issues for inverted indexes in most systems (we don't know if this is the root cause yet), because while each of the bitmaps is highly selective you have hundreds of thousands of them.
@richard892: could you run queries for `user_id`s you haven't queried before with tracing and use the JSON format view and paste the trace at the bottom here, so we can see which operators take most time?
@luisfernandez: ``` "exceptions": [], "numServersQueried": 2, "numServersResponded": 2, "numSegmentsQueried": 93, "numSegmentsProcessed": 93, "numSegmentsMatched": 4, "numConsumingSegmentsQueried": 1, "numDocsScanned": 40, "numEntriesScannedInFilter": 0, "numEntriesScannedPostFilter": 320, "numGroupsLimitReached": false, "totalDocs": 404198690, "timeUsedMs": 505, "offlineThreadCpuTimeNs": 0, "realtimeThreadCpuTimeNs": 294557175, "segmentStatistics": [], "traceInfo": {}, "numRowsResultSet": 10, "minConsumingFreshnessTimeMs": 1634052628512```
@luisfernandez: that’s without order by user_id
@luisfernandez: oh wait forgot tracing lol
@richard892: Also, it would be interesting to get a cpu profile by installing async-profiler as a native agent as outlined , setting ```-agentpath:/path/to/libasyncProfiler.so=start,event=cpu,file=cpu.html``` while you run a load of queries which always query a user_id for the first time (better automated) for about a minute or so, we can see exactly what's going on
@luisfernandez: ``` "exceptions": [], "numServersQueried": 2, "numServersResponded": 2, "numSegmentsQueried": 93, "numSegmentsProcessed": 93, "numSegmentsMatched": 4, "numConsumingSegmentsQueried": 1, "numDocsScanned": 40, "numEntriesScannedInFilter": 0, "numEntriesScannedPostFilter": 320, "numGroupsLimitReached": false, "totalDocs": 404233610, "timeUsedMs": 452, "offlineThreadCpuTimeNs": 0, "realtimeThreadCpuTimeNs": 181968373, "segmentStatistics": [], "traceInfo": { "pinot-server-1.pinot-server-headless.pinot.svc.cluster.local": "[{\"0\":[{\"SelectionOnlyCombineOperator Time\":181},{\"InstanceResponseOperator Time\":182}]},{\"0_0\":[]},{\"0_1\":[]},{\"0_2\":[{\"SortedIndexBasedFilterOperator Time\":0},{\"DocIdSetOperator Time\":0},{\"ProjectionOperator Time\":0},{\"PassThroughTransformOperator Time\":0},{\"SelectionOnlyOperator Time\":51}]},{\"0_3\":[{\"SortedIndexBasedFilterOperator Time\":0},{\"DocIdSetOperator Time\":0},{\"ProjectionOperator Time\":0},{\"PassThroughTransformOperator Time\":0},{\"SelectionOnlyOperator Time\":180}]}]", "pinot-server-0.pinot-server-headless.pinot.svc.cluster.local": "[{\"0\":[{\"SelectionOnlyCombineOperator Time\":1},{\"InstanceResponseOperator Time\":1}]},{\"0_0\":[]},{\"0_1\":[]},{\"0_3\":[{\"SortedIndexBasedFilterOperator Time\":0},{\"DocIdSetOperator Time\":0},{\"ProjectionOperator Time\":0},{\"PassThroughTransformOperator Time\":0},{\"SelectionOnlyOperator Time\":1}]},{\"0_2\":[{\"SortedIndexBasedFilterOperator Time\":0},{\"DocIdSetOperator Time\":0},{\"ProjectionOperator Time\":0},{\"PassThroughTransformOperator Time\":0},{\"SelectionOnlyOperator Time\":1}]}]" }, "numRowsResultSet": 10, "minConsumingFreshnessTimeMs": 1634052785732```
@richard892: ok so it looks like `SelectionOnlyCombineOperator` and `SelectionOnlyOperator` are the problems in this query, forget everything I said about indexes
@richard892: getting profiles of your system doing the slow query would help to figure this out at this point
@g.kishore: whats the query for above response?
@luisfernandez: `select * from table where user_id = x`
@luisfernandez: for my own education what is that? `SelectionOnlyCombineOperator` and `SelectionOnlyOperator`
@richard892: Is this on AWS by the way? What instance type are you using?
@g.kishore: ```"numSegmentsQueried": 93, "numSegmentsProcessed": 93, "numSegmentsMatched": 4,```
@luisfernandez: this is on kubernetes, google instances.
@g.kishore: you can do a lot better with pruning
@g.kishore: partitioning/bloomfilter etc will help a lot
@luisfernandez: to decrease `"numSegmentsQueried": 93`
@luisfernandez: ?
@g.kishore: no, to reduce "numSegmentsProcessed": 93,
@luisfernandez: we have partitioning 4 partitions only tho, i thought it had to match the kafka topic but that’s not the case yea?
@luisfernandez: and add bloomfilter on user_id you mean yea
@g.kishore: yes,
@g.kishore: if either do partitioning + sorting or
@g.kishore: bloomfilter + sorting
@g.kishore: you dont need inverted index for this
@luisfernandez: and if i add this now it will only apply to new segments yea? even if i reload segments
@luisfernandez: and either partitioning + sorting or bloomfilter + sorting, inverted index not required.
@g.kishore: Bloomfilter will be applied to existing segments as well
@luisfernandez: i just have to push reload segments yea?
@g.kishore: Yes, invoke the reload all segments
@luisfernandez: also one question, the number of partitions we setup in pinot doesn’t have to be equal to the one in kafka yea?
@g.kishore: Pinot does not need any setup for partitions
@luisfernandez: doens’t it need this ```"segmentPartitionConfig": { "columnPartitionMap": {```
@g.kishore: It derives it automatically from Kafka or in batch mode, you can partition the data and push it to Pinot
@g.kishore: Yes.. but number of partitions is not needed
@luisfernandez: ohh cause it already partitions by whatever kafka is partitioning by?
@luisfernandez: and in this case i’m consuming from kafka
@g.kishore: Right
@luisfernandez: this wouldn’t be needed as well?
@luisfernandez: ``` "routing": { "segmentPrunerTypes": [ "partition" ] },```
@kennybastani: Hey, you probably tried this @luisfernandez, but can you put a `LIMIT` clause on your query. Let's see how it affects the query response time. I've noticed some wonkiness lately related to this.
@g.kishore: @luisfernandez Maybe, I am not sure but we should able to apply that automatically cc @mayanks
@mayanks: Glanced over the thread, here's my initial input: ```- Definitely sort on user_id if you are not doing so. - If high throughput, partition on user_id as well. - The partition function does need to be specified and matched for partition based pruning.```
@mayanks: @luisfernandez can you confirm if the data is sorted on user_id or not?
@luisfernandez: do we add bloom filter too?
@luisfernandez: you mean just by issuing a regular select yea?
@luisfernandez: also @mayanks does the partition has to match the kafka partitioning count? or it doesn’t matter
@mayanks: No, I meant if you are sorting on user_id in offline, and specified as sorted column for RT?
@mayanks: @luisfernandez The count may not need to match, the function impl needs to (we use Murmur that matches Kafka)
@luisfernandez: yes sorted_column is user_id but in the query i’m not ordering by user_id perse
@mayanks: What's your read/write qps?
@luisfernandez: 150qps
@mayanks: do you have offline?
@luisfernandez: approx that i think it’s even lesser
@mayanks: And what's the typical number of rows selected for a user?
@mayanks: And is the slow start specific to select query or count(*) also?
@aabuda: @aabuda has joined the channel
@yangguji: @yangguji has joined the channel
@hardike: @hardike has joined the channel

#random

@tharun.3c: @tharun.3c has joined the channel
@nkuptsov: @nkuptsov has joined the channel
@talgab: @talgab has joined the channel
@aabuda: @aabuda has joined the channel
@yangguji: @yangguji has joined the channel
@hardike: @hardike has joined the channel

#troubleshooting

@tharun.3c: @tharun.3c has joined the channel
@surajkmth29: Hi Folks, We are trying to construct a tabular view from data in pinot. Eg: Get the list of top 10 userId's from Table A, get the names of those users using lookup from Table B. Is this supported using lookup?
@nkuptsov: @nkuptsov has joined the channel
@talgab: @talgab has joined the channel
@aabuda: @aabuda has joined the channel
@yangguji: @yangguji has joined the channel
@hardike: @hardike has joined the channel

#pinot-dev

@yangguji: @yangguji has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Apache Pinot Daily Email Digest (2021-10-12)

#general

#random

#troubleshooting

#pinot-dev

Reply via email to