Apache Pinot Daily Email Digest (2020-08-14)

Pinot Slack Email Digest Fri, 14 Aug 2020 19:01:16 -0700

<h3><u>#general</u></h3><br><strong>@apsingekar.venkatesh: 
</strong>@apsingekar.venkatesh has joined the 
channel<br><strong>@sundar.djeabalane2: </strong>Hi Everyone ! I’m looking for 
options to expose presto (presto coordinator) outside kubernetes with some 
basic auth. we are currently running Pinot and presto inside kubernetes. We 
have a requirement where we want to expose presto outside as a service so  
clients can connect via presto consume the data. please let me know if anyone 
has implemented the authentication at the presto 
layer.<br><h3><u>#random</u></h3><br><strong>@apsingekar.venkatesh: 
</strong>@apsingekar.venkatesh has joined the 
channel<br><h3><u>#feat-text-search</u></h3><br><strong>@harrynet1989: 
</strong>@harrynet1989 has joined the channel<br><strong>@apsingekar.venkatesh: 
</strong>@apsingekar.venkatesh has joined the 
channel<br><h3><u>#feat-rt-seg-complete</u></h3><br><strong>@harrynet1989: 
</strong>@harrynet1989 has joined the 
channel<br><h3><u>#feat-presto-connector</u></h3><br><strong>@apsingekar.venkatesh:
 </strong>@apsingekar.venkatesh has joined the 
channel<br><h3><u>#troubleshooting</u></h3><br><strong>@apsingekar.venkatesh: 
</strong>@apsingekar.venkatesh has joined the channel<br><strong>@pradeepgv42: 
</strong>QQ on star tree index, does just specifying `dimensionsSplitOrder` 
would only include dimensions included in this list?
or would we have to also specify “skipStarNodeCreationForDimensions” to avoid 
other dimensions to get included in the start tree?<br><strong>@g.kishore: 
</strong>thats right<br><strong>@pradeepgv42: </strong>got it, 
thanks<br><strong>@andrew: </strong>@andrew has joined the 
channel<br><strong>@andrew: </strong>where i can find logs for each query? 
trying to troubleshoot slow queries


i tried this but not luck for any of the brokers: `$ kubectl -n pinot logs 
pinot-broker-6 broker -f`<br><strong>@sosyalmedya.oguzhan: </strong>In batch 
ingestion, table is created automatically if it is not exist 
right?<br><strong>@fx19880617: </strong>no<br><strong>@fx19880617: 
</strong>table is required for fetching required  information e.g. schema table 
configs<br><strong>@mayanks: </strong>Yes, then you can check what was 
happening on that server at that time<br><strong>@pradeepgv42: </strong>some 
more questions on start-tree index, does reload create start-tree for old 
segments? and how do I verify that start-tree index is generated for the 
segments?
Basically I am not seeing improvement in query times after reload, so wondering 
if I am missing 
something<br><h3><u>#aggregators</u></h3><br><strong>@harrynet1989: 
</strong>@harrynet1989 has joined the 
channel<br><h3><u>#enable-generic-offsets</u></h3><br><strong>@harrynet1989: 
</strong>@harrynet1989 has joined the 
channel<br><h3><u>#pinot-dev</u></h3><br><strong>@harrynet1989: 
</strong>@harrynet1989 has joined the channel<br><strong>@jlli: </strong>Hey it 
seems the Swagger rest API console isn’t working. Can someone working on UI 
help take a look at it?<br><strong>@g.kishore: </strong>It works with 
QuickStart<br><h3><u>#announcements</u></h3><br><strong>@harrynet1989: 
</strong>@harrynet1989 has joined the 
channel<br><h3><u>#release-certifier</u></h3><br><strong>@mayanks: </strong>Hey 
guys, sorry if I am missing something obvious, is there a plan explaining the 
overall goal, with definition of 'done', along with short/medium/long-term 
milestones.<br><strong>@mayanks: </strong>We do have real issues that keep 
popping up in production today as new code gets checked in. It would be really 
helpful to ensure that at least our immediate/short term goals will indeed help 
address the issues we are actually 
seeing.<br><h3><u>#lp-pinot-poc</u></h3><br><strong>@andrew: </strong>@andrew 
has joined the channel<br><strong>@g.kishore: </strong>@g.kishore has joined 
the channel<br><strong>@mayanks: </strong>@mayanks has joined the 
channel<br><strong>@fx19880617: </strong>@fx19880617 has joined the 
channel<br><strong>@jackie.jxt: </strong>@jackie.jxt has joined the 
channel<br><strong>@andrew: </strong>@andrew set the channel purpose: Discuss 
Leanplum Pinot POC<br><strong>@g.kishore: </strong>@fx19880617 @mayanks Andrew 
is doing a perf benchmark<br><strong>@g.kishore: </strong>he has already 
enabled partitioning and segment relocator<br><strong>@mayanks: 
</strong>Nice<br><strong>@g.kishore: </strong>most of the queries take few 
ms<br><strong>@g.kishore: </strong>but the p99.9 shows some queries take 
10sec<br><strong>@g.kishore: </strong>@andrew can you paste the 
graph<br><strong>@andrew: </strong><br><strong>@mayanks: </strong>How's the 
GC?<br><strong>@mayanks: </strong>Is replica-group based routing 
enabled?<br><strong>@mayanks: </strong>to reduce fanout?<br><strong>@g.kishore: 
</strong>thats what we suspect, either GC or its segment 
flush<br><strong>@g.kishore: </strong>this is real-time 
only<br><strong>@mayanks: </strong>ok<br><strong>@mayanks: </strong>how many 
nodes is the query fanning out to?<br><strong>@g.kishore: </strong>so it only 
queries 1 node<br><strong>@g.kishore: </strong>per 
partition<br><strong>@mayanks: </strong>ok good<br><strong>@g.kishore: 
</strong>this was before segment relocation<br><strong>@jackie.jxt: </strong>Is 
all the queries the same pattern?<br><strong>@g.kishore: </strong>not sure 
after relocation was enabled<br><strong>@g.kishore: 
</strong>yes<br><strong>@andrew: </strong>the queries are of the form `SELECT 
COUNT(*) FROM events WHERE timestampMillis &gt; 0 AND userId = 
'14068106f053d26d101efef9' AND eventName = '100268a9e49b'` . The table is 
partitioned on userId<br><strong>@mayanks: </strong>is user id 
string?<br><strong>@andrew: </strong>yes<br><strong>@mayanks: </strong>ok, 
that's fine<br><strong>@mayanks: </strong>how many rows per user on 
average?<br><strong>@andrew: </strong>the data is generated randomly. the user 
IDs are distributed normally and the eventNames follow an exponential 
distribution<br><strong>@andrew: </strong>i’m starting my test with no data, so 
usually just a few rows<br><strong>@mayanks: </strong>@jackie.jxt does RT use 
off-heap by default?<br><strong>@mayanks: </strong>if GC on consuming nodes, we 
should check if off-heap is being used for RT<br><strong>@g.kishore: 
</strong>lets validate if its GC<br><strong>@mayanks: 
</strong>yeah<br><strong>@jackie.jxt: </strong>Seems 
not<br><strong>@jackie.jxt: </strong>`_isRealtimeOffheapAllocation` in 
`IndexLoadingConfig`<br><strong>@andrew: </strong>it’s easy to reproduce in one 
off queries too. here’s an example I just ran in the UI:

```{
  "resultTable": {
    "dataSchema": {
      "columnDataTypes": [
        "LONG"
      ],
      "columnNames": [
        "count(*)"
      ]
    },
    "rows": [
      [
        3
      ]
    ]
  },
  "exceptions": [],
  "numServersQueried": 2,
  "numServersResponded": 2,
  "numSegmentsQueried": 2,
  "numSegmentsProcessed": 2,
  "numSegmentsMatched": 1,
  "numConsumingSegmentsQueried": 1,
  "numDocsScanned": 3,
  "numEntriesScannedInFilter": 0,
  "numEntriesScannedPostFilter": 0,
  "numGroupsLimitReached": false,
  "totalDocs": 149234,
  "timeUsedMs": 1603,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 1597434210591
}```<br><strong>@jackie.jxt: </strong>Do you have live queries when running 
this? @andrew<br><strong>@andrew: </strong>i am running a load test right now 
with 500-1000 qps<br><strong>@andrew: </strong>and ingesting 5-10k 
events/second at the same time<br><strong>@mayanks: </strong>Is there a way for 
you to check if latency spike is during segment 
commit/relocation?<br><strong>@mayanks: </strong>Segment commit will pause 
kafka consumption, so you can use that as a proxy to find the time of 
commit<br><strong>@jackie.jxt: </strong>Based on the graph, I feel the high 
latency queries are from the cold start<br><strong>@andrew: </strong>here’s the 
full graph<br><strong>@andrew: </strong><br><strong>@andrew: </strong>now some 
segments are in error state<br><strong>@andrew: </strong>the data freshness 
also seems to get bad<br><strong>@mayanks: </strong>the spike at 12:30 is 
likely due to segment commit<br><strong>@mayanks: </strong>server logs should 
have time stamps of commit<br><strong>@mayanks: </strong>try offheap 
setting<br><strong>@mayanks: </strong>``_isRealtimeOffheapAllocation` in 
`IndexLoadingConfig`<br><strong>@mayanks: </strong>Even if it does not fix all 
spikes, it is a good idea to reduce GC impact<br><strong>@andrew: </strong>is 
that by setting `pinot.server.instance.realtime.alloc.offheap` to 
true?<br><strong>@andrew: </strong>what is the best way to set this with a helm 
install?<br><strong>@mayanks: </strong>No index loading config in table 
config<br><strong>@mayanks: </strong>@jackie.jxt confirm 
^^?<br><strong>@jackie.jxt: </strong>@andrew Yes, you need to set it as the 
server config<br><strong>@mayanks: </strong>Hmm, we should fix that. Or perhaps 
make it on by default<br><strong>@jackie.jxt: </strong>You can add that into 
the table config, and enable it if either table config or instance config has 
it as true<br><strong>@andrew: </strong>ok, maybe i’m missing something but i 
don’t see a field for server config in the Helm config file<br><strong>@andrew: 
</strong>i used the API to set it in the cluster config. should it take effect 
immediately? seems like the tail latency has dropped<br><strong>@mayanks: 
</strong>server config is read only at start, so i expect it would only take 
effect on restart<br><strong>@andrew: </strong>but it would read that config 
from zookeeper?<br><strong>@andrew: </strong>hm, it also looks like the cluster 
stopped ingesting data earlier, maybe related to segments in error 
state<br><strong>@andrew: </strong>i’m seeing logs like this now, and the 
server is unable to ingest data:

```2020/08/15 00:00:47.294 WARN [StateModel] 
[ZkClient-EventThread-20-pinot-zookeeper:2181] Default reset method invoked. 
Either because the process longer own this resource or session timedout
...
2020/08/14 23:57:47.584 WARN [events_REALTIME-RealtimeTableDataManager] 
[HelixTaskExecutor-message_handle_STATE_TRANSITION] Skipping adding existing 
segment: events__65__0__20200814T2352Z for table: events_REALTIME with data 
manager class: LLRealtimeSegmentDataManager```<br><strong>@mayanks: 
</strong>This is probably the cause of the segments in error state that you 
already had <br><h3><u>#aggregate-metrics-change</u></h3><br><strong>@steotia: 
</strong>@jackie.jxt, what should be the correct fix here? I am wondering if 
the fix is fairly quick, I can create a hotfix instead of rolling back and 
waiting for the fix. I am guessing we have to get back to previous behavior to 
compute min/max and remove the if check for 
aggregateMetrics.<br><strong>@steotia: </strong>currently the min and max are 
null<br><strong>@jackie.jxt: </strong>I'll submit the fix 
soon<br><strong>@jackie.jxt: </strong>Trying to add the test 
now<br><strong>@steotia: </strong>ok will wait for now. thanks a 
lot<br><strong>@jackie.jxt: 
</strong><https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMSfW2QiSG4bkQpnpkSL7FiK3MHb8libOHmhAW89nP5XKjr0tug9EOOmbvGLEos3pUw-3D-3D-Tjz_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTyI3XepWPf0zpQeVU94YsHyVBC-2FGf4L-2Fr8a23SmYbqTMfCucCXgJTV05t-2FKrK5GW5wrW217s5Y3wFG1TM2-2FOXbocLBtyvRZOaz9pPuFFU1Ox4dYiP1JPi1mGgI9umKbw-2FZipTxctb-2FYXc0q8t9jUdbqE4oSO3nMgvNNrSGtlQSUrR4x2gKleLDqtif1hRMQsSM-3D><br>

Apache Pinot Daily Email Digest (2020-08-14)

Reply via email to