Apache Pinot Daily Email Digest (2021-06-04)

Pinot Slack Email Digest Fri, 04 Jun 2021 19:00:30 -0700

#general

@bryagd: @bryagd has joined the channel
@mercyshans: @mercyshans has joined the channel

#random

@bryagd: @bryagd has joined the channel
@mercyshans: @mercyshans has joined the channel

#troubleshooting

@bryagd: @bryagd has joined the channel
@patidar.rahul8392: While loading data from hdfs to pinot table I m getting this exception. [r-2 apache-pinot-incubating-0.7.1-bin]$ hadoop jar ${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand -jobSpecFile /home/rah/executionFrameworkSpec.yaml Exception in thread "main" java.io.FileNotFoundException: /tmp/hadoop-unjar7575411926296177023/shaded/com/google/common/collect/ImmutableSetMultimap$EntrySet.class (No space left on device) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.<init>(FileOutputStream.java:213) at java.io.FileOutputStream.<init>(FileOutputStream.java:162) at org.apache.hadoop.util.RunJar.unJar(RunJar.java:110) at org.apache.hadoop.util.RunJar.unJar(RunJar.java:85) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Someone kindly suggest.
@jmeyer: Maybe something to do with `No space left on device` ?
@patidar.rahul8392: Yes @jmeyer it's related to space issue in /tmp direct pinot is creating plugin file here, but I am not giving this /tmp directory anywhere in my confirmation .is the any way to change this tmp location I want to move this file in different location where I have enough /tmp .dir we don't have much space available
@patidar.rahul8392:
@patidar.rahul8392: These are the files pinot is creating at /tmp location
@patidar.rahul8392: Or any way to clean this /tmp location while loading data for next file. I am able to load data for 3 files when I am trying for 4th day I ma getting this space issue
@jmeyer: From what I read, default volume size for docker containers is 10Gb, there must exist a setting to increase that value Alternatively, you could explicitly mount a volume which iirc, would only be limited by its host storage
@jmeyer: But the opinion of someone more familiar with Pinot / Docker would be appreciated :smile:
@jmeyer: (if you're using docker at all, actually)
@patidar.rahul8392: Ok @jmeyer I m not using docker
@patidar.rahul8392: Actually here only 943 mb space is available so I am wondering if is there any way in pinot so that we can create these /tmp folder files at any other location.
@laxman: @patidar.rahul8392: you can set `hadoop.tmp.dir` to the location you want to Default value of this property is `/tmp/...`
@patidar.rahul8392: Ok in which file I need to set this or its a executable directly I can run through export on terminal.
@patidar.rahul8392: ?
@laxman: You have to set this in `core-site.xml` Or hadoop configs can be set as a Java properties as well `-Dkey=value`
@patidar.rahul8392: Ok @laxman here I am trying to load only 2 kB's file and space available is 943 mb . But still it's giving space issue ?
@jmeyer: Hello :slightly_smiling_face: I've just changed the topic from which a REALTIME table is consuming, and some new messages are being published on that topic However it looks like Pinot isn't consuming them I can see a segment with `"segment.realtime.status": "IN_PROGRESS"` / "CONSUMING" Also, I'm not seeing any related logs On the previous topic, consumption was OK :heavy_check_mark:
@ssubrama: You cannot change the topic on a live table. You need to drop the table and recreate it.
@jmeyer: Should we find a way to clarify this ? Maybe deny the table update ? + documentation (if it does not already exist ^^)
@mayanks: Agree on both. @jmeyer could you add it to FAQ?
@mayanks: And add an issue for preventing the table update?
@jmeyer: @mayanks Yep :slightly_smiling_face:
@mayanks: :thankyou:
@jmeyer: Issue:
@jmeyer: PR on docs: Hard to compare before / after given the binary docs format I guess it's much easier with Gitbook :slightly_smiling_face:
@mayanks: :thankyou:
@jmeyer: Hello again Not a Pinot only question, but I'm sure most of you had to deal with this issue so here I go Given a limited Kafka retention, how do you handle recreating a table with past data that is no longer available in Kafka ? Basically, what is the "workflow" that you use to repopulate a Pinot table from past data ?
@mayanks: Typically, folks ETL the kafka data into a sot store like HDFS. You can then backfill via an offline pipeline. However, this pattern is applicable to hybrid tables
@mayanks: For realtime-only tables, you are going to be limited by kafka retentnion
@jmeyer: > For realtime-only tables, you are going to be limited by kafka retentnion Probably not a smart solution, but dumping past data (from some object store) into Kafka into the REALTIME table could be a solution too I guess ? Otherwise, does using an hybrid table add a lot of complexity / limitations ?
@ssubrama: @jmeyer you can also use
@jmeyer: Thanks @ssubrama I'll give it a good read :slightly_smiling_face:
@ken: Thanks @ssubrama - I didn’t know about this built-in support for auto-offlining old data.
@mercyshans: @mercyshans has joined the channel
@mercyshans: hi ,I am wondering if this is the comprehensive list of supported transformation function, I am looking for other functions like coalesce, is there a coming feature list

#getting-started

@santosh.reddy: @santosh.reddy has joined the channel

#releases

@santosh.reddy: @santosh.reddy has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org