Apache Pinot Daily Email Digest (2021-09-30)

Pinot Slack Email Digest Thu, 30 Sep 2021 19:00:29 -0700

#general

@gqian3: Hi team, we are currently evaluating a solution using Pinot hybrid table to produce a dataset with both S3 offline historical data and Kafka real time data. Is there some documents we can find the information about what hybrid table setup support and doesn’t support, regarding e.g. ingestion, query and retention etc. Thanks.
@xiangfu0: In pinot you can configure deep store on s3 and create a hybrid table ingest data from both batch data source(s3) and realtime data source(kafka).
@xiangfu0:
@xiangfu0:
@gqian3: Thanks, so far we only used offline table, other than table configuration setup, is there any known functional differences, limitation or constraints of using a hybrid table compared to the offline table, in terms of query, retention and ingestion?
@xiangfu0: Real-time table has different retention than offline table. Ingestion wise, it's from Kafka . For each query, it's split into two queries based on time boundary. Please check the doc for details
@dunithd: I know the Lambda architecture is old-school. But is it correct to say that Pinot fits into the ‘serving layer’ there?
@mayanks: It is the unified serving + speed layer?
@tgauchoux: @tgauchoux has joined the channel
@senthissenthh: @senthissenthh has joined the channel
@dadelcas: Is there a way to configure the desired segment size and segment creation job? I've got some small avro files and the job seems to create a segment per file, is this how it works? I'd like to squash these small files in to one bigger segment. Do I need to pre-process them myself before running the job?
@mayanks: There is a meetup today on the segment merging and roll up which might help in your case
@dadelcas: I'm registered and I plan to attend, thank you
@karinwolok1: Join LinkedIn engineering team members in 15 minutes for this meetup! @snlee @jiapengtao0 See you there! :slightly_smiling_face:
@dadelcas: Good session! :+1:
@karinwolok1: In case you missed it :wine_glass: Presentation by @snlee (Senior Software Engineer @ LinkedIn and Apache Pinot PMC) @jiatao (Software Engineer @ LinkedIn):

#random

@tgauchoux: @tgauchoux has joined the channel
@senthissenthh: @senthissenthh has joined the channel

#troubleshooting

@nadeemsadim: @mayanks @xiangfu0 @jackie.jxt pinot-server ram usage is getting increased over time without adding garbage collection params in jvmopts in pinot/values.yaml helm .. before we were using jvmopts like "*jvmOpts: "-Xms256M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/opt/pinot/gc-pinot-controller.log"*" but after migrating to jdk 11 with these jvmopts .. the pods started crashing and we have to remove these jvmopts and only using the below jvmopts ie "*jvmOpts: "-Xms2M -Xmx8G -Xloggc:/opt/pinot/gc-pinot-controller.logjavaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8008:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml" - server* " but without using garbage collection params .. we are seeing an increase in pinot-server ram usage over time and ram is getting exhausted every day by more than half gb .. what should be the jvmopts we should provide in helm for jdk11 so that pods dont crash and gc happens properly and heap is free .. also for 16 gb server ram .. what should be xmx value for server pod..
@xiangfu0: you can still use *`-XX:+UseG1GC`*
@nadeemsadim: what should be the pinot server xmx value in jvmopts in helm if ram provided is 16 gb per pinot-server pod
@richard892: @nadeemsadim my guess is you have a high cardinality inverted index (do you?) and that means you have a lot of `SoftReference`s, which would be cleared more aggressively by G1 on JDK8 than JDK11. If that's it, the issue should be fixed here:
@richard892: The best way to figure this out is to look at a *live* heap dump, or use JFR `OldObjectSample` (do *NOT* do this in production, it has very high overhead) - see here
@nadeemsadim: yes I do have inverted index on many columns some of which have high cardinality
@nadeemsadim: I see the PR merged 14 hours ago .. when can we expect this to be released or ci cd has made this release ready and pinot pull policy always will upgrade pinot to latest release after we do helm upgrade in our pinot installation in k8s cluster? @xiangfu0 @jackie.jxt @mayanks @richard892
@richard892: Would you be able to confirm the suspected cause with a live heap dump (`jmap -dump:live,file=dump.bin <pid>`)? If my guess is correct, there should be a lot of `ImmutableRoaringBitmap` by retained size.
@richard892: Do not send share the heap dump because the strings will contain sensitive data, but getting a screenshot of top retained size by type either from MAT or JVisualVM heapdump viewer would confirm the guess.
@nadeemsadim: ok let me check
@trustokoroego: Hi, I get below error when starting a pinot broker, any idea what could be causing it. The key thing I want to achieve is to set the broker to use hostname instead of the IP which changes on restart: ```Executing command: StartBroker -zkAddress pinot-zookeeper:2181 -configFileName /tmp/config/broker.conf Caught exception while starting broker, exiting java.lang.NullPointerException: null at java.util.HashMap.putMapEntries(HashMap.java:496) ~[?:?] at java.util.HashMap.putAll(HashMap.java:780) ~[?:?] at org.apache.pinot.tools.admin.command.StartBrokerCommand.getBrokerConf(StartBrokerCommand.java:140) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] at org.apache.pinot.tools.admin.command.StartBrokerCommand.execute(StartBrokerCommand.java:121) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:166) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:186) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]```
@trustokoroego: Config setting: ``` # Pinot Cluster name pinot.cluster.name=pinot-qua # Use hostname as Pinot Instance ID other than IP pinot.set.instance.id.to.hostname=true # Pinot Broker Query Port pinot.broker.client.queryPort=8099 # Pinot Routing table builder class pinot.broker.routing.table.builder.class=random```
@tgauchoux: @tgauchoux has joined the channel
@senthissenthh: @senthissenthh has joined the channel
@bajpai.arpita746462: Hi All, I am trying to enable "UPSERT" mode in REALTIME table config in pinot 0.8.0 and the table is not able to read the records send to kafka topic. No results are displayed in PINOT UI at all, it shows 0 records. Below is the config I added for Upsert: "routing": { "instanceSelectorType": "strictReplicaGroup" }, "upsertConfig": { "mode": "FULL" }, I could not find anything significant in the controller logs as well. But when I remove the UPSERT config and tried, then my RealTime Table is able to read the records and getting displayed in Pinot UI. Any idea why is this happening?
@dadelcas: Just to confirm, have you defined a primary key in your schema?
@bajpai.arpita746462: yes
@dadelcas: It may help if you post both your table config and schema
@bajpai.arpita746462: we are suspecting problem on kafka , we are trying to create topic with proper partitioning.Below is the schema: { "schemaName": "wxcanalytics", "primaryKeyColumns": ["orgId","reportId"], "dimensionFieldSpecs": [ { "name": "reportId", "dataType": "STRING" }, { "name": "orgId", "dataType": "STRING" }, { "name": "firstName", "dataType": "STRING" }, { "name": "lastName", "dataType": "LONG" } ], "dateTimeFieldSpecs": [ { "name": "pdate", "dataType": "STRING", "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd", "granularity": "1:DAYS" } ] } Table config: { "tableName": "wxcanalytics_REALTIME", "tableType": "REALTIME", "segmentsConfig": { "timeType": "DAYS", "schemaName": "wxcanalytics", "retentionTimeUnit": "DAYS", "retentionTimeValue": "7", "timeColumnName": "pdate", "replicasPerPartition": "1" }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant" }, "tableIndexConfig": { "streamConfigs": { "streamType": "kafka", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.topic.name": "bc_data", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.broker.list": "xxxxxx:xxxxx", "realtime.segment.flush.threshold.rows": "1000000", "realtime.segment.flush.threshold.time": "1h", "stream.kafka.consumer.prop.auto.offset.reset": "smallest" }, "enableDynamicStarTreeCreation": false, "aggregateMetrics": false, "nullHandlingEnabled": false, "autoGeneratedInvertedIndex": false, "createInvertedIndexDuringSegmentGeneration": false, "loadMode": "MMAP", "enableDefaultStarTree": false }, "metadata": { "customConfigs": {} }, "routing": { "instanceSelectorType": "strictReplicaGroup" }, "upsertConfig": { "mode": "FULL" }, "isDimTable": false }
@gabuglc: can u try moving the primaryKeyColumns after dateTimeFieldSpecs?
@gabuglc: "upsertConfig": { "mode": "FULL", "hashFunction": "NONE" },
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Apache Pinot Daily Email Digest (2021-09-30)

#general

#random

#troubleshooting

Reply via email to