Apache Pinot Daily Email Digest (2020-08-19)

Pinot Slack Email Digest Wed, 19 Aug 2020 19:01:12 -0700

<h3><u>#general</u></h3><br><strong>@ankit.raj.singh: </strong>Hi all,  Pinot 
seems to have Map data type support for column. Is query possible on it?…if 
possible, is there any example i can refer to?<br><strong>@cj: </strong>@cj has 
joined the channel<br><strong>@joey: </strong>I was having some trouble making 
a schema/table with transform configs, following the documentation at 
<https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMdTeAXadp8BL3QinSdRtJdpjVdLs6pcP-2BmJYu0RGOJgjW4yUWessZqxMl9cgpZEQvZ61b7e-2FULIdDgh3zQyBB6rFDS3rtq9FjWH9nlZ0hZ99VBNms8zq1x7TdCDGsaZg5i-2F97vZlpfEl6fip6rjl-2BUw-3DvQRv_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxrfh0vLw2LDMQsRtDNJtUKznpOvlN6rnjjXxrv6HQncsTrUBV7sNMNMlck104lvPWDKKJzp4EBhVJasu6WvVgcEwgXPBcIVI9J2sJtXBTYO3Xdrveqjs7cCva2Zb5pWwRF2ZMWmJ3ydDCMrjWhuc86g-2BttX2xHbQe4GxlpzZKvWWd5nu4yGamYNHfPdW2IN3U-3D>
 (=&gt; thread)<br><strong>@joey: </strong>Is there any way to get a query plan 
from Pinot, such as indexes used to pull data, or is this something that you 
can only really interpret from the query response stats on the Controller's 
query console?<br><strong>@luu: </strong>@luu has joined the 
channel<br><h3><u>#random</u></h3><br><strong>@cj: </strong>@cj has joined the 
channel<br><strong>@luu: </strong>@luu has joined the 
channel<br><h3><u>#troubleshooting</u></h3><br><strong>@laxman: </strong>Folks, 
Have some doubts related to rebalance. Can someone please clarify these or 
point me to relevant documentation.
=============
• One of our pinot servers was scaled down (kubenetes - from 4 servers to 3 
servers).
• Even after several hours segments didn’t come online.
• Same case with CONSUMING segments. Kafka partitions which were getting 
processed by scaled down server are now not getting processed at all.
=============
• When does the rebalance gets triggered? I already tried server/controller 
restarts. I also tried rebalance from controller UI
• What is the right way to scaled down a server?
FYI: we are on 0.3.0 + some fixes, in case if it 
matters<br><strong>@elon.azoulay: </strong>Is there a quick way we can convert 
realtime segments to offline segments? Is there any benefit to doing that since 
 the realtime segment is created with star tree, inverted, sorted and text 
indexes?<br><h3><u>#onboarding</u></h3><br><strong>@pratiksethi.pro94: 
</strong>@pratiksethi.pro94 has joined the 
channel<br><h3><u>#transform-functions</u></h3><br><strong>@pratiksethi.pro94: 
</strong>@pratiksethi.pro94 has joined the 
channel<br><h3><u>#community</u></h3><br><strong>@pratiksethi.pro94: 
</strong>@pratiksethi.pro94 has joined the 
channel<br><h3><u>#pinot-0-5-0-release</u></h3><br><strong>@laxman: 
</strong>@tingchen: Will you be able to cherry-pick the following to 0.5.0 rc 
branch.
<https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMSfW2QiSG4bkQpnpkSL7FiK3MHb8libOHmhAW89nP5XK7afLY5WyBExz0XzvjqBqCA-3D-3DIZrI_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxrfh0vLw2LDMQsRtDNJtUKjlRTDT1MIsm5StTQgCMYTna5ln6-2BdwGiIjdvTK-2BnwhNEXTr6S3l16LppcdNN-2B7eOc6s-2B1sRTt-2BEpPynmkSvnL4ve8uVmzH5MkmWapUgDJMRPaUMlttTLfTGDPWBgwnxqrZX7PoPmPGjyjgEpNKr1Te172Ohk8ZLIvfoSizXJD3M-3D><br><h3><u>#multiple_streams</u></h3><br><strong>@fra.costa:
 </strong>@g.kishore sorry to bother you again, but I would like to ask one 
more clarification around the duality between real-time and offline. As we 
mentioned with an hybrid table setup at query time if an entry appears in both 
stores the offline ones take precedence
• How is the “collision” determined? Is there some sort of identifier column in 
the table schema that governs that?
• When setting up real-time and offline, my understanding is that real-time has 
some sort of window associated with it: 
              1. What happens when that period passes? Are entires purged by 
the real time datastore?
              2. If an instance exists in which entries are consolidated and 
moved from real time to offline, how the same key aspect is dealt with? Are 
real time entries dismissed in favor of the existing offline (if any)?


I apologize if some of these questions are already addressed in the 
documentation, happy to read relevant sections in case I missed them
Thanks,<br><strong>@npawar: </strong>Regarding hybrid table and which data 
takes precedence, this might help: 
<https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMdTeAXadp8BL3QinSdRtJdpzMyfof-2FnbcthTx3PKzMZIvTvz0ZlGzjfnWuiLO3kB-2FQ-3D-3DH8aO_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxrfh0vLw2LDMQsRtDNJtUKV71Hy2Q4Pn7aIJhiBtj2cRegrX4tFr-2FAQhLCYrJxdrrPwA-2FojQcedZ3qZzH-2Fsrx2j1-2FCfpysdUDHM9qxYbqxSMq-2B4fjMf6BI4YAls9x3uEePBxC-2BP3r5Vk291rbC4-2Bk1U0O7nB6ukJr9tLp-2FsoWnL-2FzajZnhH6J1HItSIq9nkhk-3D>
 (time boundary section)<br><strong>@npawar: </strong>I’m not sure what you 
mean by “window” of a realtime table. We have a concept of retention. This can 
be configured for both realtime and offline tables.  If the data in the table 
becomes older than the retention, it is deleted<br><strong>@fra.costa: 
</strong>Thanks Neha, I am going to read that and reply 
after<br><strong>@fra.costa: </strong>That page perfectly explains the first 
question, it’s done on the time series dimension, there’s no key on single 
object involved, makes sense.

As for the retention, yes I was referring to that seems like that is a 
different concept than the behavior I was worring about.

I guess the only question left is if there is instances in which Pinot 
independently consolidate Realtime segments into the offline ones, I have a 
vague memory of reading something about it, but not 100% 
sure<br><strong>@fra.costa: </strong>If that is the case I am basically trying 
to understand if the Offline data is replaced by the newly “consolidated” 
realtime segments<br><strong>@npawar: </strong>as of now, the offline data 
needs to be populated by you, with your own offline jobs 
setup.<br><strong>@npawar: </strong>but,<br><strong>@npawar: </strong>we have a 
project ongoing, which will move segments from realtime to offline table - 
<https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMc9VK8AZw4xfCWnhVjqO8F0jpwxWv4fC4LAZTjvhd54Mnnp7A4BBhAtbRr8NR9LakFgJLRyTGnwArJQe4yssb406fh8XVHzSkiiNBgKkuV9jApFM_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxrfh0vLw2LDMQsRtDNJtUKfQ5n1DqLN1oI6Kdf-2B-2BIFqOJ37P7YUpw18Cen-2BZjMNo1f978pXofjkt-2Bxq6x9J4Gr-2BGefv-2Bk8LAfFshDvLEiocHd3cc5YYcM7XrNNKEnRbWifqSbE6pf9JvezRzwoxMPYb1epC32X1qX7uOxpXZcMV9WWp4Wnv8ImMONCzT-2BPgo4-3D><br><strong>@npawar:
 </strong>though i dont know if that’ll help in your case, since you need the 
accurate data merged with the realtime data<br><strong>@fra.costa: 
</strong>Thanks Neha, in my case it would actually hurt 
us<br><strong>@fra.costa: </strong>I was more watching out for that happening 
under the cover<br><strong>@fra.costa: </strong>so in that regard we are good, 
thank you very much<br>

Apache Pinot Daily Email Digest (2020-08-19)

Reply via email to