Apache Pinot Daily Email Digest (2020-10-01)

Pinot Slack Email Digest Thu, 01 Oct 2020 19:00:29 -0700

<h3><u>#general</u></h3><br><strong>@tirumalap: </strong>Hello Everyone,
Very quick question
I have lots of data in s3 bucket (parquet format). Can I use Pinot to retrieve 
or query single record base on query or condition?
Is Pinot right software to do such thing?<br><strong>@shane.fitzroy: 
</strong>Hi all, I work at an analytics start-up in Sydney Australia who've 
just landed series-A investment. And so we're now re-architecting a particular 
product for scale and to support more features - so going from an MVP monolith 
with a PostgresDB to a more distributed system.


I'm looking to understand how Pinot may fit into a new stack and would like to 
know from an operations perspective how demanding a Pinot deployment might be, 
and how we might have to scale our engineering teams to support such a 
deployment in production.

If the context helps, it's a multi-tenant application that orchestrates some 
ETL+ML data-pipelines with Spark/Databricks and the Gurobi optimiser. But 
essentially the output are "results" datasets (parquets on Azure data lake) 
comprised of 50M~100M rows and 30~40 columns. We're aiming at supporting at 
least 500 users spread over several customers/tenants. We expect concurrent 
users to be generating/experimenting with new datasets regularly throughout the 
day (hourly). One of the web applications will be a front-end with a UI 
"data-grid" where users will want to perform exploratory/interactive analysis 
on results, so aggregations/group-by/filtering/search/counting etc, at 
reasonably-low latency.

On paper, Pinot looks like a great fit, but is it overkill for us? How many 
engineers would it take to support a deployment for our volume of 
data/ingestion? Note that ZooKeeper is not an existing part of our stack yet. 
Sorry for the wall of text. Any advice or experience from others here would be 
greatly appreciated. Cheers.<br><strong>@igor.lema: </strong>Hi everyone, I’m 
looking to leverage Pinot in a simple Analytics use-case: allowing distinct 
counts, funnel analysis and anomaly detection of user click events from our App

Currently, our somewhat large company we are ingesting 100~200K events/second 
of 300 different (but defined) schemas , the biggest schema should have 40 
columns but the majority are less than 20. In this mix, there are also at least 
10% of late-events and duplicates. (2TB a day)
Currently we reach for more than 500 users querying in an 
exploratory/interactive fashion over this data in our own front-end. With Pinot 
we hope to achieve sub-minute latency.

Pinot looks the perfect fit for this use-case, since there is no need to join 
events, but my main doubt is how big should this infrastructure to support this 
volume? And how hard is going to support a deployment for this volume?

I’m planning on deploying with K8s using S3 as segment store for Pinot. I also 
don’t need the Offline Server or any batch ingestion 
job<br><strong>@sudar.parakalan: </strong>@sudar.parakalan has joined the 
channel<br><strong>@sudarson.pr: </strong>@sudarson.pr has joined the 
channel<br><h3><u>#random</u></h3><br><strong>@sudar.parakalan: 
</strong>@sudar.parakalan has joined the channel<br><strong>@sudarson.pr: 
</strong>@sudarson.pr has joined the 
channel<br><h3><u>#feat-text-search</u></h3><br><strong>@a_pathak: 
</strong>@a_pathak has joined the 
channel<br><h3><u>#feat-presto-connector</u></h3><br><strong>@a_pathak: 
</strong>@a_pathak has joined the 
channel<br><h3><u>#troubleshooting</u></h3><br><strong>@a_pathak: 
</strong>@a_pathak has joined the channel<br><strong>@pradeepgv42: </strong>Hi, 
I am trying to optimize a query of the format
```select colA, distinctCountHll(colb)
from table
where timestamp &gt; X
group by colA```
We added a star-tree with
dimensionsSplitOrder: [“colA”]
and aggregateFunction as DistinctCountHLL__colB

I am not seeing much query-time improvements, I am doing a comparison against 
aggregate by colC which is not part of star-treee and see very similar times. I 
see that star-tree index is getting generated.
Wondering if I am missing someething?<br><strong>@g.kishore: 
</strong>@pradeepgv42 output stats?<br><strong>@pradeepgv42: 
</strong>timeMs=8541,docs=361440597/16479407613,entries=336003488/722881194,segments(queried/processed/matched/consuming/unavailable):5229/227/227/8/0,consumingFreshnessTimeMs=1601574112264,servers=4/4,groupLimitReached=false,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);172.31.17.90_R=1,8536,68556,0;172.31.30.139_O=1,3,372,0;172.31.34.149_O=1,4,372,0;172.31.24.127_R=1,8470,69174,0,<br><strong>@pradeepgv42:
 </strong>2servres have old data, so thy don’t match 
anything<br><strong>@g.kishore: </strong>its still scanning a 
lot<br><strong>@pradeepgv42: </strong>yaeh<br><strong>@g.kishore: 
</strong>@jackie.jxt ^^<br><strong>@ssubrama: </strong>we dont build startree 
in the realtime ingestion path, right?<br><strong>@yash.agarwal: </strong>Also 
I think timestamp should be part of the dimensionsSplitOrder right 
?<br><strong>@jackie.jxt: </strong>Yes, yash gives the 
answer<br><strong>@jackie.jxt: </strong>Can you try removing the filter on 
`timestamp` and see the latency?<br><strong>@pradeepgv42: </strong>ah it’s very 
big table, it killed our servers last time 
:slightly_smiling_face:<br><strong>@pradeepgv42: </strong>yeah I thought that 
could be the reason too,
 so i included a minutes as one of the dimension yesterday 
evening<br><strong>@pradeepgv42: </strong>let me print the stats for 
that<br><strong>@pradeepgv42: </strong>this is for a much shorter time period 
4hours
```
timeMs=1802,docs=62235061/16481320579,entries=98184718/124470122,segments(queried/processed/matched/consuming/unavailable):5230/57/57/8/0,consumingFreshnessTimeMs=1601574542463,servers=4/4,groupLimitReached=false,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);172.31.17.90_R=1,1797,68551,0;172.31.30.139_O=0,4,372,0;172.31.34.149_O=0,4,372,0;172.31.24.127_R=1,1563,69169,0,```
<br><strong>@pradeepgv42: </strong>is there a way to verify that star-tree is 
being used?<br><strong>@pradeepgv42: </strong>I think baseed on the results 
it’s not, but not sure why<br><strong>@jackie.jxt: </strong>You included 
`minutes` in the star-tree config?<br><strong>@jackie.jxt: </strong>What is the 
query?<br><strong>@pradeepgv42: </strong>```select colA, distinctCountHll(colb)
from table
where epochMinutes &gt; X
group by colA```
<br><strong>@pradeepgv42: </strong>epochMinutes is our minutes 
column<br><strong>@jackie.jxt: </strong>`dimensionsSplitOrder: ["colA", 
"epochMinutes"]`?<br><strong>@pradeepgv42: </strong>yup<br><strong>@jackie.jxt: 
</strong>Then it should be solved by star-tree. Did this table have star-tree 
before you making this change?<br><strong>@pradeepgv42: </strong>yes there was 
one already present<br><strong>@pradeepgv42: </strong>it contains 
dimensionsSplitOrder: [“colA”, “epochMinutes”] + some more columns but 
aggregate function is just a count<br><strong>@jackie.jxt: </strong>Then 
reloading won't help in that case..<br><strong>@pradeepgv42: </strong>yeah 
that’s why I am just looking at the newer data<br><strong>@jackie.jxt: 
</strong>Can you please check the metadata for the queried segments and see if 
the new star-tree is generated?<br><strong>@pradeepgv42: </strong>yup, I pinged 
you in personal message the metadata<br><strong>@pradeepgv42: </strong>could it 
be because colA and colB are present in both the star-tree 
indicies?<br><strong>@pradeepgv42: </strong>we most likely figured out the 
issue, thanks a lot @jackie.jxt
we have a hybrid table and timestamp column gets auto added to the filter and 
star-tree is not used in query 
execution<br><h3><u>#lp-pinot-poc</u></h3><br><strong>@fx19880617: 
</strong><br><strong>@fx19880617: </strong>here I tried to tune kafka to 32 
partitions. pinot segment to 400M<br><strong>@fx19880617: </strong>And from 
23:00 I changed the gc to `g1gc`<br><strong>@fx19880617: </strong>also I 
changed 
`pinot.server.instance.realtime.max.parallel.segment.builds=2`<br><strong>@fx19880617:
 </strong>seems p99 will increase when consuming half size of the segment  and 
then drop once all segments are flushed<br><strong>@fx19880617: </strong>p999 
is not shooting up during segment build after I changed the gc to 
`g1gc`<br><strong>@fx19880617: </strong>this is my current jvmOpts:
```-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=80:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml
-Xms4G -Xmx16G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime 
-XX:+PrintGCApplicationConcurrentTime
-Xloggc:/opt/pinot/gc-pinot-server.log 
-Dlog4j2.configurationFile=/opt/pinot/conf/pinot-server-log4j2.xml```<br><strong>@mayanks:
 </strong>I think Xms should be same as Xmx <br><strong>@fx19880617: </strong>I 
can make that change as well<br><strong>@fx19880617: </strong>here are some 
slow queries:<br><strong>@fx19880617: </strong>```2020/10/01 18:46:07.776 INFO 
[QueryScheduler] [pqr-6] Processed 
requestId=8989635,table=events_REALTIME,segments(queried/processed/matched/consuming)=1/1/0/-1,schedulerWaitMs=0,reqDeserMs=490,totalExecMs=0,resSerMs=0,totalTimeMs=490,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-2.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=0,scanInFilter=3,scanPostFilter=0,sched=fcfs
2020/10/01 18:46:07.777 INFO [QueryScheduler] [pqr-0] Processed 
requestId=8979151,table=events_REALTIME,segments(queried/processed/matched/consuming)=1/1/0/-1,schedulerWaitMs=0,reqDeserMs=520,totalExecMs=1,resSerMs=0,totalTimeMs=521,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-1.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=0,scanInFilter=2,scanPostFilter=0,sched=fcfs
2020/10/01 18:46:07.777 INFO [QueryScheduler] [pqr-8] Processed 
requestId=8980815,table=events_REALTIME,segments(queried/processed/matched/consuming)=1/1/0/-1,schedulerWaitMs=0,reqDeserMs=535,totalExecMs=1,resSerMs=0,totalTimeMs=536,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-0.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=0,scanInFilter=2,scanPostFilter=0,sched=fcfs
2020/10/01 18:46:07.777 INFO [QueryScheduler] [pqr-2] Processed 
requestId=8993187,table=events_REALTIME,segments(queried/processed/matched/consuming)=1/1/0/-1,schedulerWaitMs=0,reqDeserMs=543,totalExecMs=0,resSerMs=0,totalTimeMs=543,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-3.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=0,scanInFilter=4,scanPostFilter=0,sched=fcfs
2020/10/01 18:46:07.789 INFO [QueryScheduler] [pqr-3] Processed 
requestId=8999077,table=events_REALTIME,segments(queried/processed/matched/consuming)=1/1/0/-1,schedulerWaitMs=0,reqDeserMs=564,totalExecMs=1,resSerMs=0,totalTimeMs=565,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-4.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=0,scanInFilter=2,scanPostFilter=0,sched=fcfs```
<br><strong>@fx19880617: </strong>and gc log:<br><strong>@fx19880617: 
</strong>```2020-10-01T16:46:05.571+0000: 39322.336: Application time: 
4.9600067 seconds
2020-10-01T16:46:05.571+0000: 39322.336: [GC pause (G1 Evacuation Pause) 
(young), 1.0534652 secs]
   [Parallel Time: 1048.2 ms, GC Workers: 11]
      [GC Worker Start (ms): Min: 39322336.8, Avg: 39322336.9, Max: 39322337.1, 
Diff: 0.3]
      [Ext Root Scanning (ms): Min: 0.5, Avg: 0.8, Max: 1.4, Diff: 0.8, Sum: 
8.3]
      [Update RS (ms): Min: 24.1, Avg: 24.6, Max: 25.3, Diff: 1.2, Sum: 270.2]
         [Processed Buffers: Min: 14, Avg: 20.3, Max: 32, Diff: 18, Sum: 223]
      [Scan RS (ms): Min: 980.2, Avg: 980.8, Max: 981.0, Diff: 0.8, Sum: 
10788.9]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 
0.2]
      [Object Copy (ms): Min: 41.3, Avg: 41.4, Max: 41.5, Diff: 0.2, Sum: 455.7]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
         [Termination Attempts: Min: 1, Avg: 18.5, Max: 23, Diff: 22, Sum: 204]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.1, Sum: 0.8]
      [GC Worker Total (ms): Min: 1047.5, Avg: 1047.7, Max: 1047.8, Diff: 0.3, 
Sum: 11524.3]
      [GC Worker End (ms): Min: 39323384.5, Avg: 39323384.6, Max: 39323384.7, 
Diff: 0.1]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.7 ms]
   [Other: 4.6 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.4 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 1.5 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 1.8 ms]
   [Eden: 744.0M(744.0M)-&gt;0.0B(744.0M) Survivors: 73728.0K-&gt;73728.0K 
Heap: 7260.6M(16384.0M)-&gt;6525.2M(16384.0M)]
 [Times: user=11.55 sys=0.00, real=1.05 secs]
2020-10-01T16:46:06.625+0000: 39323.390: Total time for which application 
threads were stopped: 1.0542157 seconds, Stopping threads took: 0.0001375 
seconds```
<br><strong>@fx19880617: </strong>not sure why `reqDeserMs=490` took this 
long<br><strong>@mayanks: </strong>P99 increasing with num in-memory rows tells 
me the query still fans out to &gt; 1 RT nodes <br><strong>@fx19880617: 
</strong>```2020/10/01 18:27:44.803 INFO [QueryScheduler] [pqr-6] Processed 
requestId=8781331,table=events_REALTIME,segments(queried/processed/matched/consuming)=1/1/0/1,schedulerWaitMs=10,reqDeserMs=0,totalExecMs=1,resSerMs=0,totalTimeMs=11,minConsumingFreshnessMs=1601576864792,broker=Broker_pinot-broker-1.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=0,scanInFilter=16,scanPostFilter=0,sched=fcfs
2020/10/01 18:27:44.804 INFO [QueryScheduler] [pqr-10] Processed 
requestId=8802075,table=events_REALTIME,segments(queried/processed/matched/consuming)=1/1/1/1,schedulerWaitMs=8,reqDeserMs=1,totalExecMs=2,resSerMs=0,totalTimeMs=11,minConsumingFreshnessMs=1601576864802,broker=Broker_pinot-broker-4.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=1,scanInFilter=12,scanPostFilter=0,sched=fcfs
2020/10/01 18:27:48.020 INFO [QueryScheduler] [pqr-6] Processed 
requestId=8781573,table=events_REALTIME,segments(queried/processed/matched/consuming)=1/1/0/1,schedulerWaitMs=1,reqDeserMs=750,totalExecMs=2,resSerMs=0,totalTimeMs=753,minConsumingFreshnessMs=1601576648403,broker=Broker_pinot-broker-1.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=0,scanInFilter=0,scanPostFilter=0,sched=fcfs```
<br><strong>@fx19880617: </strong>some queries take more time on 
schedulerWaitMs<br><strong>@mayanks: </strong>How many RT servers 
queried<br><strong>@fx19880617: </strong>```2020/10/01 19:06:31.921 INFO 
[BaseBrokerRequestHandler] [jersey-server-managed-async-executor-570] 
requestId=9289631,table=events_REALTIME,timeMs=2,docs=0/4781010,entries=30/0,segments(queried/processed/matched/consuming/unavailable):8/8/0/0/0,consumingFreshnessTimeMs=0,servers=3/3,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);pinot-server-3_R=0,1,336,0;pinot-server-7_R=0,2,338,0;pinot-server-6_R=0,2,337,1,query=SELECT
 COUNT(*) FROM events WHERE timestampMillis &gt; 0 AND userId = 
'11739104d64b7441707fe00c' AND eventName = '1050977c37f0'
2020/10/01 19:06:31.928 INFO [BaseBrokerRequestHandler] 
[jersey-server-managed-async-executor-570] 
requestId=9289632,table=events_REALTIME,timeMs=2,docs=0/20480009,entries=25/0,segments(queried/processed/matched/consuming/unavailable):18/18/0/1/0,consumingFreshnessTimeMs=1601579191927,servers=5/5,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);pinot-server-5_R=0,1,337,0;pinot-server-3_R=0,1,337,0;pinot-server-7_R=0,2,337,0;pinot-server-0_R=0,2,422,0;pinot-server-6_R=0,1,337,0,query=SELECT
 COUNT(*) FROM events WHERE timestampMillis &gt; 0 AND userId = 
'1857951479fd79dcb8756b28' AND eventName = '164790de7191'
2020/10/01 19:06:31.930 INFO [BaseBrokerRequestHandler] 
[jersey-server-managed-async-executor-563] 
requestId=9289633,table=events_REALTIME,timeMs=2,docs=0/4755282,entries=20/0,segments(queried/processed/matched/consuming/unavailable):8/8/0/0/0,consumingFreshnessTimeMs=0,servers=4/4,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);pinot-server-5_R=0,1,337,0;pinot-server-3_R=0,1,336,0;pinot-server-7_R=0,2,337,0;pinot-server-6_R=0,2,336,0,query=SELECT
 COUNT(*) FROM events WHERE timestampMillis &gt; 0 AND userId = 
'132823755b93eb49f1ce2b5b' AND eventName = '12264662576e'
2020/10/01 19:06:31.934 INFO [BaseBrokerRequestHandler] 
[jersey-server-managed-async-executor-563] 
requestId=9289634,table=events_REALTIME,timeMs=2,docs=0/4755282,entries=10/0,segments(queried/processed/matched/consuming/unavailable):8/8/0/0/0,consumingFreshnessTimeMs=0,servers=3/3,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);pinot-server-5_R=0,0,336,0;pinot-server-3_R=0,1,337,0;pinot-server-7_R=0,1,337,0,query=SELECT
 COUNT(*) FROM events WHERE timestampMillis &gt; 0 AND userId = 
'1548223777e5a95db4d5a606' AND eventName = '1209099b8c2d'
2020/10/01 19:06:31.936 INFO [BaseBrokerRequestHandler] 
[jersey-server-managed-async-executor-570] 
requestId=9289635,table=events_REALTIME,timeMs=2,docs=0/20534651,entries=98/0,segments(queried/processed/matched/consuming/unavailable):18/18/0/1/0,consumingFreshnessTimeMs=1601579191931,servers=5/5,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);pinot-server-5_R=1,0,336,0;pinot-server-3_R=1,1,338,0;pinot-server-4_R=1,1,423,0;pinot-server-7_R=1,1,339,0;pinot-server-6_R=1,1,338,0,query=SELECT
 COUNT(*) FROM events WHERE timestampMillis &gt; 0 AND userId = 
'10995588e3bc063feb373401' AND eventName = '141905eadb6b'```
<br><strong>@fx19880617: </strong>I think only 1 server<br><strong>@fx19880617: 
</strong>there are segments moved to offline servers<br><strong>@mayanks: 
</strong>why is cpu usage across servers so unbalanced<br><strong>@mayanks: 
</strong>are all servers not doing same/similar work?<br><strong>@mayanks: 
</strong>Yeah, so query still fans out to multiple RT nodes: 
`server-7_R=0,2,337,0;pinot-server-6_R=0,2,336,0`<br><strong>@mayanks: 
</strong>This is in line with what I have seen before. If fanout is higher, the 
as number of in-memory rows increase, p99 increases. This is because some of 
the servers in the fanout does GC<br><strong>@fx19880617: </strong>4 are 
consuming ,4 are serving offline segments moved from 
realtime<br><strong>@fx19880617: </strong>server 0/1/2/4 are 
consuming<br><strong>@fx19880617: </strong>3/5/6/7 are just 
querying<br><strong>@fx19880617: </strong><br><strong>@fx19880617: </strong>the 
cpu usage of 0/1/2/4 are higher<br><strong>@fx19880617: 
</strong><br><strong>@mayanks: </strong>Right, consumption takes more 
CPU<br><strong>@mayanks: </strong>Ok, this makes sense<br><strong>@mayanks: 
</strong>However, num docs scanned is quite different across servers 
too<br><strong>@mayanks: </strong>All offline should be in one range, and all 
RT should be in another range<br><strong>@fx19880617: </strong>then I think it 
actually makes sense?<br><strong>@fx19880617: </strong>most of segments are 
moved to OFFLINE<br>

Apache Pinot Daily Email Digest (2020-10-01)

Reply via email to