Apache Pinot Daily Email Digest (2021-06-09)

Pinot Slack Email Digest Wed, 09 Jun 2021 19:00:28 -0700

#general

@burgalon: Is there a way to query the result of the startree-index for time periods? I guess this type of query is probably executed by ThirdEye?
@mayanks: Yes, you can query Pinot for time periods using any index not just star tree.
@burgalon: where can I find info on how to construct such a pinot-SQL query?
@mayanks: It would be like sql (where timeCol > x and timeCol < y), or using between as you would in sql
@mayanks: You can also refer to
@patidar.rahul8392: Is there any way to increase server memory while starting pinot server in cluster-mode. I have my server on 2 different nodes , whenever I am trying to refresh my superset dashboard it's firing some queries to pinot and fetching data from server. So one of my server automatically showing dead state when I checked the log so it's showing there is insufficient memory for the Java runtime environment to continue and server stopped working there. Is there any way to resolve this issue. @fx19880617 @ken @jackie.jxt
@ken: If your data is replicated, then wouldn’t the other server still be able to respond to the query? Normally you’d do this by setting the `replication` value in your table json, something like: ``` "segmentsConfig" : { "schemaName" : "blah", "timeColumnName": "date", "timeType": "DAYS", "replication" : "2", "segmentPushType": "APPEND" },```
@patidar.rahul8392: Ohh ok @ken actually in my config file replication value is 1 so it's not replacating the segments
@patidar.rahul8392: So thinking if is there any alternet way because if I set this property then I need to create table again and have to load all the offline and realtime data once again
@jackie.jxt: You may add another server, change the replication to 2 and then run rebalance to replicate the segments to another server
@jackie.jxt:
@patidar.rahul8392: Or any way to transfer the failed server data to next available server.i.e. server1
@pedro.cls93: Hello, What is the difference between `segmentsConfig.replication` & `segmentsConfig.replicasPerPartition` for a realtime table?
@jmeyer: IIRC, `segmentsConfig.replication` is for OFFLINE tables only while `segmentsConfig.replicasPerPartition` is for REALTIME tables
@pedro.cls93: Thank you Jonathan appreciate it! I read the same here: but it was not very explicit
@pedro.cls93: @mayanks (sorry to disturb once more...) Can you confirm our assumption?
@jackie.jxt: @pedro.cls93 Yes, Jonathan gives the right answer
@mayanks: Thanks @jackie.jxt, could we make it explicit in the docs?
@mayanks: Also @pedro.cls93 there's <#C023BNDT0N8|pinot-docsrus> for community to help out with improving docs, in case you'd like to contribute back to improve the docs.
@mapshen: For stream ingestion with Kafka, only JSON format is currently supported right? The input formatslisted here are only for batch ingestion?
@fx19880617: json/avro
@fx19880617: you can also write your own decoder by implementing: `org.apache.pinot.spi.stream.StreamMessageDecoder`
@g.kishore: thrift/protobuf etc streamsource is decoupled from the data format
@mapshen: i see. Missed the avro one
@mapshen: thought we would put all kafka decoders under `src/main/java/org/apache/pinot/plugin/stream/kafka`
@mapshen: > thrift/protobuf etc streamsource is decoupled from the data format @g.kishore would you mind elaborating on this? You mean thrift/protobuf is available for kakfa streaming ingestion? I don't see a decoder class for them under `pinot-plugins/pinot-input-format`
@g.kishore: yeah, looks like thrift/protobuf is available only for batch ingestion.. I dont see any reason why we cant add that for real-time as well

#troubleshooting

@nadeemsadim: is there any way to aggreagete time series data on basis of tumbling time window while ingesting in pinot itself say on basis of time and update the records automatically without any need of tumbling window aggregation in my stream processing job(ex-samza job/spark streaming) .. means can inot keep updating the same countPerMin column automatically whenever new feeds for same time window comes .. or they have to be handled in samza jobs/spark streaming only .cc: @hussain
@fx19880617: right now you need to do this in samza or flink job. Pinot consumes data from kafka as it as and only support row level transformation.
@nadeemsadim: also my topic is not partitioned by key on which I am aggregating(It was partitioned while creation of the topic itself to say 20 partitions) .. thus multiple events on topic from where pinot consumes will be generated for same key for same time .. ie suppose no of page views for 9th june at 14:13 IST .. 5 records are ingested by pinot with no of page views for the same user id as 10,12,4,5,3 since topic was partitioned and multiple samza consumers of same samza consumer group were aggregating for the same topic from different set of partitions .. and thus emitting different msgs on output topic from where pinot consumes .. can I upsert the pinot record for same user id by adding the last value with the latest value ie for *userid="123",time="09/06/2021 14:17 IST",countPerMin=5* was there already .. now same *userid="123",time="09/06/2021 14:17 IST",countPerMin=10* record came on kafka topic where pinot is consuming from .. then will it update the existing record from *userid="123",time="09/06/2021 14:17 IST",countPerMin=15 .. is such kind of upsert possible? incremental upsert kind of ufnctionality or maybe through some sort of user defined function?*
@fx19880617: for upsert, if you make user id as primary key and time, then you can do pre-agg on your streaming job to publish the second msg like `{*userid="123",time="09/06/2021 14:17 IST",countPerMin=10*}` and ``{*userid="123",time="09/06/2021 14:17 IST",countPerMin=15*}` pinot will upsert the field *`countPerMin`*
@nadeemsadim: so here.. will countPerMin become 15 or it will become 25 since I dont just want to be updated with the latest value .. I want it to add with the last existing value
@fx19880617: latest value
@fx19880617: 15
@nadeemsadim: any way to add with the existing value by some UDF or any means
@nadeemsadim: since my use case requires it to be updated with 25 rather than 15
@nadeemsadim: as 10 + 15 *=25*
@fx19880617: then you don’t need upsert, you can just do it during runtime to sum up
@nadeemsadim: u mean in my samza job itself
@fx19880617: or you generate this 25 in your streaming job and use upsert
@nadeemsadim: you can just do it during runtime to sum up > how to do that in pinot
@fx19880617: query
@nadeemsadim: ok so every event before i need to publish on output kafka topic from where pinot consumes .. i should make a query on pinot broker for existing value and then do the addition
@nadeemsadim: but it will cause too many db calls on pinot and samza processing throughput may get slow since events maybe at very high frequency same lacs per sec . .then it wont be feasible to make a network http restful call
@fx19880617: or you just do aggregation on samza job
@fx19880617: you can have a kv store to keep the local status
@nadeemsadim: ok then i have to hold the entire state of my jobs in samza inbuilt state management database like leveldb or rocksdb or KV store
@nadeemsadim: ok got it @fx19880617 thanks for the clarification . very helpful .. will come back if anymore doubt on this
@fx19880617: yeah, partition + rocksdb should work
@nadeemsadim: sure will try that
@mapshen: Hi Folks, it seems that I cannot get `docker build` to work although it compiles on my local computer. For example, if I do `./docker-build.sh pinot` under `docker/images/pinot/docker-build.sh`, it fails when compiling `pinot-controller`, and the error is ```#12 241.2 [INFO] --- frontend-maven-plugin:1.1:npm (npm run-script build) @ pinot-controller --- #12 241.2 [INFO] Running 'npm run-script build' in /opt/pinot-build/pinot-controller/src/main/resources #12 241.4 [INFO] #12 241.4 [INFO] > pinot-controller-ui@1.0.0 build /opt/pinot-build/pinot-controller/src/main/resources #12 241.4 [INFO] > webpack --mode production #12 241.4 [INFO] #12 257.6 [ERROR] Killed #12 257.6 [ERROR] npm ERR! code ELIFECYCLE #12 257.6 [ERROR] npm ERR! errno 137 #12 257.6 [ERROR] npm ERR! pinot-controller-ui@1.0.0 build: `webpack --mode production` #12 257.6 [ERROR] npm ERR! Exit status 137 #12 257.6 [ERROR] npm ERR! #12 257.6 [ERROR] npm ERR! Failed at the pinot-controller-ui@1.0.0 build script. #12 257.6 [ERROR] npm ERR! This is probably not a problem with npm. There is likely additional logging output above. #12 257.6 [ERROR] #12 257.6 [ERROR] npm ERR! A complete log of this run can be found in: #12 257.6 [ERROR] npm ERR! /root/.npm/_logs/2021-06-09T16_55_54_192Z-debug.log ... #12 264.0 [ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:1.1:npm (npm run-script build) on project pinot-controller: Failed to run task: 'npm run-script build' failed. (error code 137) -> [Help 1] #12 264.0 [ERROR] #12 264.0 [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. #12 264.0 [ERROR] Re-run Maven using the -X switch to enable full debug logging. #12 264.0 [ERROR] #12 264.0 [ERROR] For more information about the errors and possible solutions, please read the following articles: #12 264.0 [ERROR] [Help 1] #12 264.0 [ERROR] #12 264.0 [ERROR] After correcting the problems, you can resume the build with the command #12 264.0 [ERROR] mvn <args> -rf :pinot-controller``` Am I missing something obvious? Not sure if someone else can reproduce it.
@jackie.jxt: This page might help:
@mapshen: That was it! Nice find @jackie.jxt. This 2GB mem limit is probably unique to Mac as I recently switch to it and never had issues running Linux

#pinot-dev

@fx19880617: Please take a look this: Upgrade pinot to use java11 compiler and allowed fallback to build and run with JDK 8:
@snlee: @fx19880617 are we dropping java8 and move towards java11? (i.e. we officially moved to java 8 to use lambda features a while ago)
@fx19880617: Yes, that’s the plan.
@snlee: got it. At LinkedIn, we internally migrated to jvm 11 for the runtime env and saw good performance improvements in gc metrics.
@fx19880617: cool, this pr makes Pinot compilable in java11, current pinot can use jdk11 to build java8 binary and runtime on java11
@snlee: I think that we can announce that Pinot officially supports Java 11 when we cut the next release.
@fx19880617: sounds good
@mayanks: Hi team, we have `"realtime.max.parallel.segment.builds"` that controls the number of parallel segment builds on realtime servers. I notice that the default value is 0 (which I think implies unconstrained, which is inherently capped by number of partitions). I am curious to know if folks have found this default to be good? If not, should we have a more conservative default (say 4)? cc @ssubrama @steotia @yupeng
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Apache Pinot Daily Email Digest (2021-06-09)

#general

#troubleshooting

#pinot-dev

Reply via email to