Hi, I'm looking to do a dissertation on Drill, as part of masters degree in Data Science. I'm hoping to set up a cluster to run it and then analyse its efficiency with different datasets, as well as make recommendations for its usage. I know Drill is in a fairly early stage of development but I have around 18 months until the project is due, so I'm hoping the timing will work as Drill is developed further.
I'd be grateful for any advice on how I could get started on this. Would a Hadoop cluster be a good back-end to base my project on or would something more suited to nested data like MongoDB be more appropriate? Also, I haven't found much documentation on configuring Drill in a distributed environment, so any help on this would be appreciated. I'd also be willing to contribute but not sure if I have enough Java experience. My background is mainly in BI and database technologies. Thanks, Tom
