glasser commented on issue #7048: Make IngestSegmentFirehoseFactory splittable for parallel ingestion URL: https://github.com/apache/incubator-druid/pull/7048#issuecomment-462529816 Good question. I was kind of imagining you would set taskGranularity equal to your output segmentGranularity so that each subtask would write one segment. You're right that things can get unbalanced though. Are you imagining that the split implementation would query the segments metadata to learn all the segment sizes and the user would specify bytes per split? Would we try to not divide any input segments but just chunk them together? This seems like a reasonable option to desire but I kind of feel like people might still want to get started with the simpler "I know my peons can handle an hour of data, just split by hours" anyway... so implementing one of these options doesn't necessarily stop from implementing the other later.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
