Apache Pinot Daily Email Digest (2021-08-18)

Pinot Slack Email Digest Wed, 18 Aug 2021 19:00:29 -0700

#general

@wentjin: Hi Team, I found a issue when using Partial mode Upsert. When getting a previous record before the merge, we always reuse the GenericRow object but not clean or modify _nullValueFields, see MutableSegmentImpl#getRecord. And PartialUpsertHandler will use the _nullValueFields which may expire to check column value is null. This will cause Partial Upsert to miss previous column data and not work.
@yupeng: thanks for reporting this. @qiaochu can you take a look?
@qiaochu: thanks for reporting. Taking a look.
@santhosh: An important question, can we use pinot for this usecase. I want to store users and attributes. The attributes will keep increasing. But user base can remain constant or incrementaly increase. The writes and reads are very high on this data store. I am interested to use query routing feature to replica groups for reads and support writes at high scale.
@sosyalmedya.oguzhan: Pinot is designed for realtime OLAP operations. It also supports data ingestion in realtime and batch. I'm not sure it's fit your case. Do you have any analytical operations?
@santhosh: For my current use case, I dont have any OLAP operations. Its just a point query based on a user id and attribute. I know it supports ingestion in batch and real time. I am not worried about the ingestion part actually. Just whether my use case fits in pinot because it is not a timeseries data model.
@sosyalmedya.oguzhan: you can use cassandra or redis (if you want to use memory based db) with userid and attribute key Or just use relational database (like postgres) if your data size is not too much
@sosyalmedya.oguzhan: pinot is not a good option for you
@santhosh: Yes thank you. I was thinking to go with Cassandra or aerospike. I will evaluate these two.
@g.kishore: +1 to what @sosyalmedya.oguzhan said. This is not a good fit for Pinot. Any NoSQL database or even simple relational dbms will do the job
@mosiac: @mosiac has joined the channel
@roberto: hi!! one quick question about ingestion transformation configuration, is this this syntax valid? `fromDateTime(jsonPathString(json.path, '$.timestamp, ''), 'EEE MMM dd HH:mm:ss ZZZ yyyy')` I mean, is it supported to nest transformation functions? I’m trying to convert a json string date-time into a Long timestamp
@mayanks: Nested transforms are supported in general, so I’d expect this to work too. Please let us know if you see an issue
@jzanko: @jzanko has joined the channel
@karinwolok1: Today! Check out this meetup! Presentation by @shgandhi and @miliang
@karinwolok1: And in case you missed it, here's the video

#random

@mosiac: @mosiac has joined the channel
@jzanko: @jzanko has joined the channel

#troubleshooting

@chxing: Hi All I have built pinot0.8.0 release package and deploy in our env , but I found zookeeper page can’t be open
@chxing: `Could not load content for webpack:///js/main.js (HTTP error: status code 404, net::ERR_UNKNOWN_URL_SCHEME)`
@xiangfu0: @sanket fyi
@chxing:
@sanket: ok, I’ll take a look
@chxing: Thx @sanket
@sanket: hi @chxing which quick start did you try? just want to make sure I follow your steps and see if able to reproduce it
@chxing: Hi here is the start commend
@chxing: `/opt/cisco/apache-pinot-incubating-0.8.0-bin/bin/pinot-admin.sh StartController -configFileName $CONF_FILE`
@chxing: `[root@sj1-pinot-controller-02 controller]# cat /etc/pinot_controller.conf` CONF_FILE=/opt/cisco/apache-pinot-incubating-0.8.0-bin/conf/controller.properties ZK_ADDRESS=10.250.84.82:2181,10.250.87.96:2181,10.250.87.184:2181 CLASSPATH_PREFIX=“/opt/cisco/hadoop-2.7.0/share/hadoop/hdfs/hadoop-hdfs-2.7.0.jar:/opt/cisco/hadoop-2.7.0/share/hadoop/common/lib/hadoop-annotations-2.7.0.jar:/opt/cisco/hadoop-2.7.0/share/hadoop/common/lib/hadoop-auth-2.7.0.jar:/opt/cisco/hadoop-2.7.0/share/hadoop/common/hadoop-common-2.7.0.jar:/opt/cisco/hadoop-2.7.0/share/hadoop/common/lib/guava-11.0.2.jar:/opt/cisco/hadoop-2.7.0/share/hadoop/common/lib/gson-2.2.4.jar:/opt/cisco/hadoop-2.7.0/share/hadoop/hdfs/lib/htrace-core-3.1.0-incubating.jar” JAVA_OPTS=“-Xms28G -Xmx32G -XX:+UseG1GC -javaagent:/opt/cisco/apache-pinot-incubating-0.8.0-bin/agent/jmx_prometheus_javaagent-0.12.0.jar=9066:/opt/cisco/apache-pinot-incubating-0.8.0-bin/agent/pinot.yml -Dplugins.dir=/opt/cisco/apache-pinot-incubating-0.8.0-bin/plugins -Dlog4j2.configurationFile=/opt/cisco/apache-pinot-incubating-0.8.0-bin/conf/pinot-controller-log4j2.xml -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:-UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=20M -Xloggc:/var/log/pinot/controller/gc-%t.log”
@chxing: `[root@sj1-pinot-controller-02 controller]# cat /opt/cisco/apache-pinot-incubating-0.8.0-bin/conf/controller.properties` controller.data.dir= controller.local.temp.dir=/vdb/pinot/controller controller.zk.str=10.250.84.82:2181,10.250.87.96:2181,10.250.87.184:2181 controller.access.protocols.http.port=9000 controller.helix.cluster.name=PinotCluster pinot.controller.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS pinot.controller.storage.factory.hdfs.hadoop.conf.path=/opt/cisco/hadoop-2.7.0/etc/hadoop pinot.controller.segment.fetcher.protocols=file,http,hdfs pinot.controller.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher # pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.principle=`<your kerberos principal>` # pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.keytab=`<your kerberos keytab>` controller.allow.hlc.tables=false controller.enable.split.commit=true controller.vip.port=9000 controller.port=9000 .hostname=true controller.deleted.segments.retentionInDays=3 controller.retention.frequencyInSeconds=3600 controller.retentionManager.initialDelayInSeconds=120 # pinot Enable PinotTaskManager controller.task.frequencyInSeconds=3600
@chxing: Hi @sanket Just confirm if caused by my build issue?
@chxing:
@chxing: I got failed log when building
@sanket: that `ERROR` line is OK, you can ignore that.
@chxing:
@chxing: Here is my build log and many warnings here
@sanket: I was able to run `quick-start-hybrid` with this release. Can you double check if you can also run on your local or not? What I did was build the pinot from root directory using command: `mvn clean install -Denforcer.skip=true -DskipTests -Pbin-dist` then navigated to build: `cd pinot-distribution/target/apache-pinot-0.8.0-bin/apache-pinot-0.8.0-bin` run: `./bin/quick-start-hybrid.sh` Went to `` and UI is running.
@chxing: ok i will try
@chxing: Hi @sanket With your method I still can’t open zookeeper page
@chxing:
@chxing: Seems you have session in you local env?
@mosiac: @mosiac has joined the channel
@mosiac: Hello All! Is there anything special about extracting values from nested JSON object? My (stream) ingestion looks like ```{ "metadata": { "queryId": "20210818_091933_00000_j2ma7", "transactionId": "89328f28-146d-46c7-b562-f9a436e78bac", "query": "select * from postgresql.bnef_930.weather limit 10", } }``` My schema has the fields ``` { "name": "metadata.queryId", "dataType": "STRING" }, { "name": "metadata.query", "dataType": "STRING" },``` But those don't get populated. Do I need to explicitly flatten metadata? Also kind of related to this, is unnesting data suported in realease 0.7.1, or is it 0.8 only?
@valentin: Hello, I’m having some bad performances with an `order` statement: • When I’m doing this query: ```select * from datasource_607ec7451360000300516e33 where REGEXP_LIKE(url, '^.*tv.*$') limit 10``` only 240 docs are scanned and I get a reply in 20ms • When I’m adding an order: ```select * from datasource_607ec7451360000300516e33 where REGEXP_LIKE(url, '^.*tv.*$') order by "timestamp" desc limit 10``` 547 212 docs are scanned and I get a reply in >1.5s Do you have any ideas/tips to improve this?
@mosiac: I'm new to pinot so take this with a grain of salt. What it is happening here is: • in query 1 pinot goes over rows, tries to match the regex and stops when it finds the first 10 to match • in query 2 pinot has to find all rows that match the regex, then order them, and finally it can take the first 10 (or calls nthelemnt for 1..10 instead of sorting) Please correct me if im wrong, I'm curious about how this works internally.
@jzanko: @jzanko has joined the channel
@syedakram93: Hi, Iam trying to migrate existing controller data from local disk to HDFS deepstore.. reference: i changed required segment metadata to respective hdfs path after migrating this /tmp/data/PinotController/{Table}/{segment} to hdfs ... and restarted server , controller as well.
@syedakram93: while querying, its getting timedout and not working
@syedakram93: @xiangfu0 tried to help me, but didnt worked out, can someone help migrating local data to deep store... ??
@syedakram93: @npawar @mayanks
@syedakram93: used to configure HDFS as deepstore...

#pinot-dev

@mosiac: @mosiac has joined the channel

#pinot-docsrus

@jmeyer: Hey :wave: Now that 0.8.0 is out, I'm wondering if we've got a list of docs that still needs to be updated ? I could try to understand some of the new changes and update docs accordingly
@mayanks: The release is not officially out yet, will be today and should come up with release notes
@mayanks: Or did you see a release announcement that I missed?
@vananth22: @vananth22 has joined the channel
@changliu: @changliu has joined the channel
@changliu: Hi folks, a QQ, I tried to make changes to Pinot-docs, but somehow the modified MD file is shown as Binary. Any idea why this happened?
@mayanks: Check this to see if it applies:
@mayanks: I am guessing you have UTF16 characters?
@changliu: Thanks @mayanks . I think it is due to UTF16 chars
@changliu: Hi @jackie.jxt, I found that other MD files are shown as text format but only table.md is shown as binary. For example this ,
@changliu: Seems I also cannot use iconv to convert the encoding to utf-8, ```iconv: configuration-reference/table.md:1:24285: incomplete character or shift sequence``` Probably because of some of the previous changes.
@jackie.jxt: @changliu Can you please rebase the change and see if it can correctly show
@changliu: It works now. Thanks for fixing it. I also updated my PR.
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org