Apache Pinot Daily Email Digest (2020-07-21)

Pinot Slack Email Digest Tue, 21 Jul 2020 20:01:24 -0700
<h3><u>#general</u></h3><br><strong>@mailtobuchi: </strong>Quick question about 
Pinot query.
If this was the Pinot query result plan, does this mean `numSegmentsProcessed` 
segments were mem mapped?
```{
    "resultTable": {
        "dataSchema": {
            "columnDataTypes": ["BYTES"],
            "columnNames": ["id"]
        },
        "rows": [
            ["6a254bd3c853e950"]
        ]
    },
    "exceptions": [],
    "numServersQueried": 4,
    "numServersResponded": 4,
    "numSegmentsQueried": 1237,
    "numSegmentsProcessed": 1229,
    "numSegmentsMatched": 4,
    "numConsumingSegmentsQueried": 8,
    "numDocsScanned": 4,
    "numEntriesScannedInFilter": 4,
    "numEntriesScannedPostFilter": 4,
    "numGroupsLimitReached": false,
    "totalDocs": 265510367,
    "timeUsedMs": 32,
    "segmentStatistics": [],
    "traceInfo": {},
    "minConsumingFreshnessTimeMs": 1595297570989
}```
<br><strong>@mayanks: </strong>No, this is the number of segments the query had 
to process.<br><strong>@hiboss1: </strong>@hiboss1 has joined the 
channel<br><strong>@dlavoie: </strong>Wouldn't Pinot make an incredible 
datasource for Grafana?<br><strong>@pradeepgv42: </strong>@steotia Thanks a lot 
for enabling the TEXT_MATCH feature on dictionary encoded columns.
on a smaller table with ~25M rows, simple regexp_like query takes 178ms vs 
TEXT_MATCH takes ~30ms
This is pretty cool.<br><strong>@g.kishore: </strong>Amazing video by 
@kennybastani on Deploying Pinot on Kubernetes 
<https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMc92bR9g-2BkGUUQX5IM7P9-2BAhHRVzOXTS92je0dEky-2B6erluPv4yRe5Qxf8-2BPzQrpHg-3D-3DJHPT_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTybC1-2B-2FcUzX2RLE0WEiXpW8-2FoU2Y6JwxpYBGsUTMOfMOQsqm51Fyzpb3bLaYfh1TSyYryVJawHiPEsIw2FwM9lfYIO-2FfnyBg2faUY-2FGTgbUWpBy2hTlI03MzGgGB5mzTur8qzcpXO4hA0qczzwAyzOolfDvN60lMyZoH2Uda0Bgz2qIA-2BXBWWA94G1dOwatLvc-3D><br><strong>@rahulvinaykumar.chhap:
 </strong>Thanks for sharing :slightly_smiling_face:<br><strong>@sanjay: 
</strong>@sanjay has joined the 
channel<br><h3><u>#random</u></h3><br><strong>@hiboss1: </strong>@hiboss1 has 
joined the channel<br><strong>@sanjay: </strong>@sanjay has joined the 
channel<br><h3><u>#troubleshooting</u></h3><br><strong>@ankit.raj.singh: 
</strong>@ankit.raj.singh has joined the channel<br><strong>@elon.azoulay: 
</strong>We are about to upgrade to pinot-0.4.0 - do you recommend going to 
head or just cutting it at the 0.4.0 release commit?<br><strong>@elon.azoulay: 
</strong>Any notable config changes, or k8s changes we should be aware of? 
We're on pinot-0.3.0 now<br><strong>@damianoporta: </strong>Nooooo I have just 
upgraded my custom aggregation function :smile: did you change the 
API?<br><strong>@damianoporta: </strong>:joy:<br><strong>@g.kishore: 
</strong>@elon.azoulay I would go with 0.4.0 unless you need any feature in 
master<br><strong>@quietgolfer: </strong>Sorry, I think I've asked before (I 
lost my slack history).  Is there an easy way to have Pinot take the realtime 
inputs and automatically run data ingestion jobs to populate the offline 
tables?  Mostly checking to see if I can shortcut some work for a v1 
deliverable.  I assume there is probably a simple setup to output the kafka 
topic for 1 day, split the data and run batch ingestion 
jobs.<br><strong>@g.kishore: </strong>Yes, it’s doable but there is no such 
tool <br><strong>@g.kishore: </strong>You can download the real-time segments 
use Pinot segment reader to read multiple segments to generate a new offline 
segment and push it<br>
Apache Pinot Daily Email Digest (2020-07-21)

Reply via email to