paul-rogers commented on pull request #2485: URL: https://github.com/apache/drill/pull/2485#issuecomment-1077279388
@jnturton you make good points. I somewhat worry that folks who build large distributed systems are a bit under-represented in our current wonderful group of contributors, so I do see a drift toward small-scale concerns. (For example, I've never seen any shop query 1000s of Excel files, but folks do that every day for JSON, Parquet and CSV.) Similarly, the data science folks want Drill do to it all in SQL. But, at scale, people build out data pipelines so that the files which Drill queries are the result of a process to produce a query-optimized format such as well-partitioned Parquet. The idea of having two forks (or modes) was an attempt to preserve Drill's distributed system heritage, while also allowing the current contributors to go wild on the data science use cases. Maybe a simpler solution is to just have different "bootstrap" files: the at-scale edition, the data science edition, etc. Each addition includes plugins and defaults optimized for the target use case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org