Apache Pinot Daily Email Digest (2022-02-22)

Pinot Slack Email Digest Tue, 22 Feb 2022 18:00:35 -0800

#general

@jiajunbernoulli: @jiajunbernoulli has joined the channel
@wcxzjtz: hello everyone, wondering if we have `st_setsrid` like function in crdb to change the spatial reference system?
@g.kishore: dont think its available. I know H3 library thats used supports it. so, it might be a simple udf to support it cc @yupeng something as simple as adding this to ScalarFunctions.java or overloading the existing functions to take srid as additional parameter ``` @ScalarFunction public static byte[] setSRID(byte[] bytes, int srid) { Geometry geometry = GeometrySerializer.deserialize(bytes); geometry.setSRID(srid); return GeometrySerializer.serialize(geometry); }```
@wcxzjtz: gotcha. is our default spatial reference system id `4326` ?
@yupeng: Yes, it's 4326
@yupeng: Adding this func is easy, the hard part is that the serialization does not store it today
@yupeng: Serialization today uses 1 bit to differentiate geography vs geometry, but not the general srid to save storage.
@yupeng: It's possible to build an extension to this, though
@wcxzjtz: got it. thanks.
@prashant.pandey: Hello team, I am trying to run the Realtime Provisioner for one of my tables with the following config: `RealtimeProvisioningHelper -tableConfigFile /Users/prashant.pandey/table_config.json -numPartitions 4 -pushFrequency null -numHosts 12 -numHours 2 -sampleCompletedSegmentDir /Users/prashant.pandey/segment_dir -ingestionRate 4750 -maxUsableHostMemory 10G -retentionHours 24` The segment is around 426M in size. But this returns the following: ```Note: * Table retention and push frequency ignored for determining retentionHours since it is specified in command * See 2022/02/22 11:41:31.825 INFO [RealtimeProvisioningHelperCommand] [main] Memory used per host (Active/Mapped) numHosts --> 12 | numHours 2 --------> NA | 2022/02/22 11:41:31.826 INFO [RealtimeProvisioningHelperCommand] [main] Optimal segment size numHosts --> 12 | numHours 2 --------> NA | 2022/02/22 11:41:31.826 INFO [RealtimeProvisioningHelperCommand] [main] Consuming memory numHosts --> 12 | numHours 2 --------> NA | 2022/02/22 11:41:31.827 INFO [RealtimeProvisioningHelperCommand] [main] Total number of segments queried per host (for all partitions) numHosts --> 12 | numHours 2 --------> NA | Class transformation time: 0.271994872s for 4134 classes or 6.579459893565553E-5s per class``` Why am I getting `N/A` s? Is the config incorrect?
@prashant.pandey: We found why this happened. The problem was that our retention period is 7 days, but we move segments to OFFLINE servers under 3h. I was configuring the retention to be 7 days due to which `if (activeMemoryPerHostBytes <= _maxUsableHostMemory)` in `MemoryEstimator.java` was evaluating to be `false`.
@mark.needham: so did you have to update your table config to get this working?
@ssubrama: 1. @prashant.pandey this means that you don't have enough active memory to host all your mem requirements for 24h. You can run the command with higher memory and see what it reports. It will give you a report of mapped vs raw memory as well (which means data is pulled from disk by OS whenever needed). If you are ok with that, then you may be fine with the existing memory/numHosts. Otherwise, you need to increase something. Just to get an idea, you can always run the command with higher memory and more number of hosts (you can give multiple values) and see where you stand.
@ssubrama: @mark.needham not sure why table config needs to change?
@mark.needham: Ah I dunno, was just asking what Prashant had changed to get it to work.
@prashant.pandey: @mark.needham Yes actually had to reduce retention from 7 days to 3h in our table config. This was done was segments are stored in realtime servers only for some time, and then are moved to OFFLINE servers. The program actually uses what’s in the supplied config over what’s supplied in the program args. So this 24h was actually moot and not used - It was using full 7 days as retention period as was present in the config@ssubrama. I think we can document this special case, and also that retention period specified in the config takes precedence over the one supplied in prog. args.
@moradi.sajjad: @prashant.pandey that's not the case. Look at the code in RTProvHelper Command where it uses the value: If _retentionHours is provided as a command argument, it ignores the table config retention.
@moradi.sajjad: And subbu is right. When you get NA, it means the memory is not enough. So use a large number for maxUsableHostMemory parameter so you'll see how much memory you'll need
@alihaydar.atil: Hello everyone :slightly_smiling_face: I was wondering if there is any update on this issue? Is there any work done on it or are you planning on implementing this feature in the near future? Wish everybody a great day!
@kishorenaidu712: @kishorenaidu712 has joined the channel
@ryantle1028: @ryantle1028 has joined the channel
@kishorenaidu712: Hi, is there any approach to view the contents stored on segment ?
@karinwolok1: Meetup tomorrow!! Feel free to share with friends who you think would benefit . :slightly_smiling_face:
@karinwolok1: Welcome :wave: to all the new Apache Pinot :wine_glass: community members! Please tell us who you are and what brought you here! :smiley: @kishorenaidu712 @manish.jaiswal @jiajunbernoulli @naga.b @jatink.5251 @spboora @aliakbari76318 @juan @praveen82 @surya.patnaik1 @imptrik @apte.kaivalya @jma @nouru @achyuthaputha @vvydier @karsumit94 @drew.flintosh @jt @pawel.wasowicz @rautelachetan @bvencill
@makhli: @makhli has joined the channel
@tiger: Hi, just wondering how does replicas work for realtime tables in terms of choosing which replicas to query? From what I can see, it appears that the broker randomly chooses which replica to use when querying.
@tiger: Also, when a replica goes down and returns, does pinot wait for it to recover and catch up before querying it again?
@g.kishore: yes, it randomly selects the replica. As of today, when a replica goes and returns, broker does not wait to recover and catch up before querying. We thought of adding this but we did not because in practice, the catch up is very fast and hardly noticeable.. we have seen speed of 100k event/sec during catch up..
@tiger: thanks!

#random

@jiajunbernoulli: @jiajunbernoulli has joined the channel
@kishorenaidu712: @kishorenaidu712 has joined the channel
@ryantle1028: @ryantle1028 has joined the channel
@makhli: @makhli has joined the channel

#troubleshooting

@jiajunbernoulli: @jiajunbernoulli has joined the channel
@yeongjukang: Hello team, I wanted to drop a server instance from a cluster to shrink cluster size. So I executed command below after helm chart update but met an error. ```curl -XDELETE localhost:9000/instances/Server_pinot-server-2.pinot-server-headless.dev-pinot.svc.cluster.local_8098 {"_code":409,"_error":"Failed to drop instance Server_pinot-server-2.pinot-server-headless.dev-pinot.svc.cluster.local_8098 - Instance Server_pinot-server-2.pinot-server-headless.dev-pinot.svc.cluster.local_8098 exists in ideal state for user2_REALTIME"}``` • What will happen if i update zk's idealstate of all tables related to server-2 to server-1? (table status became healthy again) • Will also there be automatic copy based on other segment to maintain replica desire?
@g.kishore: this should help
@yeongjukang: @g.kishore Thanks a lot! I tried that one through GUI after server deletion but met this one. ```Caused by: java.net.UnknownHostException: pinot-server-2.pinot-server-headless.dev-pinot.svc.cluster.local```
@yeongjukang: Additionally, before that, I didn't read the log then but there were always message that segments are balanced.
@mayanks: I think the sequence is to first untag the server, then rebalance, and then remove the server.
@yeongjukang: @mayanks Thanks for reply. I got aware of untag now. Does the rebalance do same thing internally with what i did?
@yeongjukang: It just works now so that's why I am asking
@mayanks: Rebalance does not untag.
@deemish2: Hello Team , I am running query via pinot UI using where clause on some column value == false. it gives result --. even when we use where clause and filter value with 0.
@richard892: hi I've noticed this myself. @sanket can you take a look at this please? cc @mayanks
@kishorenaidu712: @kishorenaidu712 has joined the channel
@kishorenaidu712: Hey, I recently started using pinot and facing an issue with ingesting data with JSON column. I have marked the column as JSON data type in schema and have used JSON index for the column as well. But when I query the data, I get null value for the entire JSON column. Where did I go wrong?
@mark.needham: Hey - no tsure, you'll have to give a bit more information. e.g • How are you importing the data? • What do you table config/schema look like?
@kishorenaidu712: I am importing the data through batch ingestion from standalone machine.
@mark.needham: ok cool. So you said you get rows returned but they're empty? Can you share the ingestion job spec + a sample of the CSV file that you're ingesting?
@kishorenaidu712:
@kishorenaidu712: Yes the values returned are null, when I try querying the data.
@mark.needham: Thanks. Pinot tries to map the column name in the schema to a field name in each JSON document. So you would need to create a key called `sample` to have this work. If you update your JSON file to read like this: ```{"sample": {"name":{"first":"daffy","last":"duck"},"score":101,"data":["a","b","c","d"]}} {"sample": {"name":{"first":"donald","last":"duck"},"score":102,"data":["a","b","e","f"]}} {"sample": {"name":{"first":"mickey","last":"mouse"} ,"score":103 ,"data":["a" ,"b" ,"g" ,"h"]}} {"sample": {"name":{"first":"minnie" ,"last":"mouse"} ,"score":104 ,"data":["a" ,"b" ,"i" ,"j"]}} {"sample": {"name":{"first":"goofy" ,"last":"dwag"} ,"score":104 ,"data":["a" ,"b" ,"i" ,"j"]}} {"sample": {"person":{"name":"daffy duck" ,"companies":[{"name":"n1" ,"title":"t1"} ,{"name":"n2" ,"title":"t2"}]}}} {"sample": {"person":{"name":"scrooge mcduck" ,"companies":[{"name":"n1" ,"title":"t1"} ,{"name":"n2" ,"title":"t2"}]}}}```
@mark.needham: and then run the ingestion job again
@ryantle1028: @ryantle1028 has joined the channel
@apte.kaivalya: Hey :wave: I am deploying pinot using helm charts on a k8s cluster. I have done it several times before but seeing this issue for the first time. Any ideas? ```Cluster manager: Broker_email-analytics-pinot-broker-1.email-analytics-pinot-broker.email-pinot.svc.test01.k8s.run_8099 disconnected Failed to start Pinot Broker org.apache.helix.HelixException: Cluster structure is not set up for cluster: email-analytics at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:1124) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:701) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:738) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at org.apache.pinot.broker.broker.helix.BaseBrokerStarter.start(BaseBrokerStarter.java:209) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at org.apache.pinot.tools.service.PinotServiceManager.startBroker(PinotServiceManager.java:143) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:92) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.lambda$run$0(StartServiceManagerCommand.java:276) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:302) [pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.run(StartServiceManagerCommand.java:276) [pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] Failed to start a Pinot [BROKER] at 0.691 since launch org.apache.helix.HelixException: Cluster structure is not set up for cluster: email-analytics at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:1124) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:701) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:738) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at org.apache.pinot.broker.broker.helix.BaseBrokerStarter.start(BaseBrokerStarter.java:209) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at org.apache.pinot.tools.service.PinotServiceManager.startBroker(PinotServiceManager.java:143) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:92) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.lambda$run$0(StartServiceManagerCommand.java:276) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:302) [pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.run(StartServiceManagerCommand.java:276) [pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-8bbf93aa4377dbdf597e7940670893330452b33f] Shutting down Pinot Service Manager with all running Pinot instances... Shutting down Pinot Service Manager admin application... Deregistering service status handler```
@apte.kaivalya: the only thing changed is I am using a new Zk cluster
@mark.needham:
@mark.needham: I think you need to configure helm to auto restart server/broker/controller on error
@mark.needham: in this post I show how to do it for docker
@apte.kaivalya: thanks I will look at that.
@apte.kaivalya: I think helm already has retries.. because I have seen pods going from failing to running state.
@mark.needham: and still showing this error each time?
@mark.needham: or you see the error message and it's actually working?
@apte.kaivalya: one of the brokers started successfully other one keeps failing
@apte.kaivalya: same with controllers
@mark.needham: with that error?
@apte.kaivalya: yeah.. but it keeps trying
@apte.kaivalya: ideally once a cluster structure is setup on Zk it should work right?
@mark.needham: yeh
@apte.kaivalya: ok looks like the error has gone away.
@mark.needham: oh ok
@mark.needham: That error should be a race condition that only happens the first time that a cluster is formed. I have tried it loads of times to check that assumption and it seems to be true. But let us know if you see it happen again.
@apte.kaivalya: thank you. yes I will notify :eyes:
@xiangfu0: if a new zk, make sure you have controller started then broker/servers?
@xiangfu0: For the first time, controller will construct all the paths
@apte.kaivalya: Hmm ok, let me check if I can control the startup order
@makhli: @makhli has joined the channel
@luisfernandez: hey friends, I asked this sometime ago, i’m my company we are trying to move to pinot from another data source, we are trying to validate whatever we are storing in pinot is equal to what we have in our separate data source, how can you do this kind of validations with pinot? last time i was suggested to treat the underlying topic that our table consume from as the source of truth, does this still hold true? so you would compare the contests on that topic vs biquery? thanks for your help!
@g.kishore: use time based queries and give enough buffer for all sources to catch up..
@g.kishore: for e.g. compare ```select count(*) from T where time between t1 and t2 select sum(metric) from T where time between t1 and t2 select distinctCount(dim) from T where time between t1 and t2``` run this on both the big query and Pinot
@luisfernandez: thank you!
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Apache Pinot Daily Email Digest (2022-02-22)

#general

#random

#troubleshooting

Reply via email to