<h3><u>#general</u></h3><br><strong>@yash.agarwal: </strong>Is it possible to
use multiple buckets for S3PinotFs ? We have limitations to the amount of data
we can store in a single bucket.<br><strong>@mayanks: </strong>@kharekartik
^^<br><strong>@kharekartik: </strong>Hi @yash.agarwal, currently it is not
possible. Let me take a look into what can be done<br><strong>@g.kishore:
</strong>@yash.agarwal what kind of limitation do you
have<br><strong>@yash.agarwal: </strong>@g.kishore We have our buckets limited
to 1TB and 2 million objects, and we are looking to deploy a cluster well over
50TB.<br><strong>@g.kishore: </strong>got it, let me see how we can support the
multiple buckets.<br><strong>@yash.agarwal: </strong>Sure. Do let me know if I
can do anything to help :slightly_smiling_face:.<br><strong>@g.kishore:
</strong>would love to get your help, created
<#C016ZKW1EPK|s3-multiple-buckets><br><h3><u>#troubleshooting</u></h3><br><strong>@somanshu.jindal:
</strong>Hi, If i want to use zookeeper cluster for production setup, Can i
specify all the zookeeper hosts when starting various pinot components like
controller, broker etc.<br><strong>@yash.agarwal: </strong>@yash.agarwal has
joined the channel<br><strong>@somanshu.jindal: </strong>I need help with
hardware requirements for the various components like cores, memory etc?
Also which components are memory intensive, io intensive, cpu intensive etc.
Currently i am thinking of
• Controller - 2
• Broker - 2
• Servers - 3 (for realtime ingestion)
• Zookeeper (should i go with standalone or cluster?)
As far as i know, segments are stored on servers and controller (segment
store), right?<br><strong>@yash.agarwal: </strong>Is it possible to use
multiple buckets for S3PinotFs ? We have limitations to the amount of data we
can store in a single bucket.<br><strong>@g.kishore: </strong>@somanshu.jindal
For prod, here is a good setup
```controller
- min 2 (for fault tolerance) ideal 3
- 4 core, 4 gb (disk space should be sufficient for logs and temp segments) -
100 GB
Broker
- Min 2, add more nodes as needed as later to scale
- 4 core, 4gb (disk space should be sufficient for logs) - 10GB min
Zookeeper (cluster mode),
- min 3 (this is where the entire cluster state is stored)
- 4 gb, 4 core, disk space sufficient to store logs, transaction logs and
snapshots. If you can afford, go with ssd if not disk will be fine. 100GB
Pinot server
- Min 2 (this is where the segments will be stored), you can add more servers
anytime without downtime
- 8 core, 16 gb, SSD boxes (pick any size that works for your use case (500 gb
to 2TB or even more).
- If you are running on cloud, you can use mounted ssd instead of local
ssd```<br><strong>@pyne.suvodeep: </strong>@pyne.suvodeep has joined the
channel<br><strong>@pradeepgv42: </strong>QQ, wondering how difficult would it
be to include timestampNanos as part of the time column in pinot?
(is it just a matter of pinot parsing and understanding that timestamp is in
Nanos or there are more assumptions around?)
I believe currently till `millis` is supported. Context is we have system level
events (think stream of syscalls)
and want to be able to store the nanos timestamp to fix the order among them
and also it’s used by other systems in our infrastructure.
Currently I am storing nanos column as a different column and created a `millis`
column to serve as time column, thinking if I can avoid storing the additional
duplicate info if the feature is simple enough to add?<br><strong>@g.kishore:
</strong>IMO, nanos cannot be used as timestamp<br><strong>@g.kishore:
</strong>irrespective of Pinot supporting that datatype<br><strong>@g.kishore:
</strong>nanos is mainly used to measure relative
times<br><strong>@elon.azoulay: </strong>FYI, we have a table which already
exists and I wanted to add a sorted column index but getting "bad request 400".
Nothing in the controller logs. Can you see what's wrong with the
following?<br><strong>@elon.azoulay: </strong>```curl -f -k -X POST --header
'Content-Type: application/json' -d '@realtime.json'
${CONTROLLER}/tables```<br><strong>@elon.azoulay: </strong>```{
"tableName": "oas_integration_operation_event",
"tableType": "REALTIME",
"segmentsConfig": {
"timeColumnName": "operation_ts",
"timeType": "SECONDS",
"retentionTimeUnit": "DAYS",
"retentionTimeValue": "7",
"segmentPushType": "APPEND",
"segmentPushFrequency": "daily",
"segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
"schemaName": "oas_integration_operation_event",
"replicasPerPartition": "3",
"timeType": "SECONDS"
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant"
},
"tableIndexConfig": {
"loadMode": "MMAP",
"invertedIndexColumns": [ "service_slug", "operation_type",
"operation_result", "store_id"],
"sortedColumn": ["operation_ts"],
"noDictionaryColumns": [],
"aggregateMetrics": "false",
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.consumer.type": "LowLevel",
"stream.kafka.topic.name":
"oas-integration-operation-completion-avro",
"stream.kafka.decoder.class.name":
"org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder",
"stream.kafka.consumer.factory.class.name":
"org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.decoder.prop.schema.registry.rest.url":
"<https://u17000708.ct.sendgrid.net/ls/click?upn=iSrCRfgZvz-2BV64a3Rv7HYVQ6HO-2FNd3WXo8sCVuFwfT0-3DT4te_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxq8wcUqUF3xaBwWeV07JNVyRGy4jlW5PJgT6jQqbHf3TPoY-2FqgmxDrNxIDcaah2om0KvbgMcFLGXrE8ZfpBNvOa9cIJododz1I6dFs45CFYTkxvtRRBjmslWphjLH4q6H1lFMXjU7Oa0hAjVJFMuO-2BC0ULgQjrczkzjbMYZ8ac8tFMZprfJvJ5lZlXAH5d4-2FE-3D>",
"stream.kafka.zk.broker.url": "XXXX/",
"stream.kafka.broker.list": "XXXX:9092",
"realtime.segment.flush.threshold.time": "6h",
"realtime.segment.flush.threshold.size": "0",
"realtime.segment.flush.desired.size": "200M",
"stream.kafka.consumer.prop.auto.isolation.level": "read_committed",
"stream.kafka.consumer.prop.auto.offset.reset": "smallest",
"stream.kafka.consumer.prop.group.id":
"oas_integration_operation_event-load-pinot-llprb",
"stream.kafka.consumer.prop.client.id": "XXXX"
},
"starTreeIndexConfigs": [{ "dimensionsSplitOrder": [ "service_slug",
"store_id", "operation_type", "operation_result" ], "functionColumnPairs": [
"PERCENTILEEST__operation_latency_ms", "AVG__operation_latency_ms",
"DISTINCTCOUNT__store_id", "COUNT__store_id", "COUNT__operation_type" ] }, {
"dimensionsSplitOrder": [ "service_slug", "store_id" ], "functionColumnPairs":
[ "COUNT__store_id", "COUNT__operation_type" ] }]
},
"metadata": {
"customConfigs": {}
}
}```<br><strong>@mayanks: </strong>IIRC, uploading segments to realtime tables
was not possible (a while back, but not sure if it continues to be the
case).<br><strong>@elon.azoulay: </strong>This is just updating the spec for
the table<br><strong>@mayanks: </strong>can you try
swagger?<br><strong>@elon.azoulay: </strong>Sure<br><strong>@elon.azoulay:
</strong>Oh, thanks! Looks like I can't change the time type for the time
column, i.e. segmentsConfig.timeType<br><strong>@mayanks: </strong>Makes sense,
that could be backward
incompatible.<br><h3><u>#presto-pinot-streaming</u></h3><br><strong>@elon.azoulay:
</strong>Here's a link to the design doc:
<https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMc9VK8AZw4xfCWnhVjqO8F2yNvEnb3JHma9TbSCfyfAx-2FVOn7Bt885qSK47uf3MFF-2FhL8qplE-2FLYisjbzJXY-2FUB7YCnAiPcrkdz5y054MsHzlsZTBtUMD-2BcUlK45ORI42w-3D-3DdUo8_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxq8wcUqUF3xaBwWeV07JNVh4mtLbu51UvID-2BpIVeVfHHAkz-2BQGywKBCG-2BuczerYFmfsSw-2BaUhWdf5KrlyQBpdgjghNzrbFX8rvY73d4ST7SlokoYDYRdCoOTGb1ArYbbIkXTayr2aC97n0VXZH4chsCkI8vMD05ZPq-2FvzlmlID-2FWYWayA-2FwE2RKIfyz6P47zs-3D><br><strong>@g.kishore:
</strong>@jackie.jxt can you please take a look at
this?<br><strong>@jackie.jxt: </strong>Sure<br><strong>@g.kishore:
</strong>@elon.azoulay need access<br><strong>@elon.azoulay: </strong>Try this
one:
<https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMc9VK8AZw4xfCWnhVjqO8F2yNvEnb3JHma9TbSCfyfAx-2FVOn7Bt885qSK47uf3MFF-2FhL8qplE-2FLYisjbzJXY-2FUB7YCnAiPcrkdz5y054MsHzlsZTBtUMD-2BcUlK45ORI42w-3D-3D32ZZ_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxq8wcUqUF3xaBwWeV07JNVquxchxi3QlvwYIA1-2FNYdsWIcFvbIHp6nKWfN04ATBV0yJvPGfj63ENLE4TNmKIg-2BcbJT6F3swY6J8adylMAjX7HFQOXlImxxHKo7cX7oqBOq-2BDPxsm1a5e4fBK7n4PpmlT6r4qZMmM16VR4YCnDU4w0ygo9mC2b-2BJwiMNVWoK98-3D><br><strong>@g.kishore:
</strong>can you write a few sentences on why we need this and whats the
current design<br><strong>@g.kishore:
</strong><https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMWR9hf84-2BEJYpip6YlEfWjHMb3DE3DtTnj4lc7ywiNxn8nE0KD6t23Jqnbnkq1-2Fazw-3D-3D_TmA_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTxq8wcUqUF3xaBwWeV07JNVzSBoU3zH4HjRuVheDvC3EgsKYdEk1Y6sJnY9wsmnoKBRjducBzXmsKfeziONk-2BOyIWDjmSFdd1orV6HvzPyxRynSRgZCN5CvD8J3b1YDJphT3Nc3t10nYBybTrYtMgwY6TWsi-2B0Dtu-2Fmo7DmxIVTnkAUvz5OUTUwdy7ZFd9iAvI-3D><br><strong>@g.kishore:
</strong>use this diagram<br><strong>@g.kishore:
</strong><br><strong>@g.kishore: </strong>today we are in unary
streaming<br><strong>@g.kishore: </strong>and we want to move to server
streaming<br><strong>@g.kishore: </strong>advantages
• less memory pressure on pinot server<br><strong>@g.kishore: </strong>• presto
workers can start working as soon as chunks arrive<br><strong>@elon.azoulay:
</strong>Sure<br><h3><u>#s3-multiple-buckets</u></h3><br><strong>@g.kishore:
</strong>@g.kishore has joined the channel<br><strong>@yash.agarwal:
</strong>@yash.agarwal has joined the channel<br><strong>@kharekartik:
</strong>@kharekartik has joined the channel<br><strong>@singalravi:
</strong>@singalravi has joined the channel<br><strong>@kharekartik:
</strong>@g.kishore Is there a support for multiple directories for FS? If Yes,
we can extend that to multiple buckets.<br><strong>@kharekartik:
</strong>@yash.agarwal How do you want to split data across
buckets?<br><strong>@g.kishore: </strong>@kharekartik No, I was thinking if
users can provide a list of subFolders/s3buckets, we can pick one randomly or
hash it based on segment name<br><strong>@kharekartik: </strong>Randomly at the
time of creating the segments?<br><strong>@kharekartik: </strong>Wouldn't that
disrupt the query execution?<br><strong>@g.kishore: </strong>no, we just store
the uri along with segment metadata in ZK<br><strong>@g.kishore: </strong>it
can point to anything<br><strong>@g.kishore: </strong>actually, this is a
problem only with real-time where we create the URI<br><strong>@g.kishore:
</strong>with batch ingestion, user can provide any
URI<br><strong>@yash.agarwal: </strong>We don’t have any specific requirement
around how to slit data across buckets.<br><strong>@pradeepgv42:
</strong>@pradeepgv42 has joined the channel<br><strong>@kharekartik:
</strong>Ok. Then I believe the change needs to be done in the handling of
ingestion config and then picking a random directory while creating segments
S3 filesystem implementation won't need any change unless the buckets are
located in different regions<br><strong>@yash.agarwal: </strong>all the buckets
are co located.<br><strong>@g.kishore: </strong>Yash, is this realtime or
offline<br><strong>@yash.agarwal: </strong>Right now it is only
offline.<br><strong>@g.kishore: </strong>then, you dont need any thing for
now<br><strong>@g.kishore: </strong>I am guessing you will use the
ingestion-job to generate the segments<br><strong>@vallamsetty:
</strong>@vallamsetty has joined the channel<br><strong>@yash.agarwal:
</strong>Yeah I realised that too. I am very new to this so sorry for any
troubles :slightly_smiling_face:<br><strong>@g.kishore: </strong>no worries,
this is a good feature to have. if you dont mind, can you create an issue<br>