Hi All, Wanted to pass along some good foundational material about databases. We find ourselves immersed day-to-day in the details of Drill's implementation. It is helpful to occasionally step back and look at the larger DB tradition in which Drill resides. This material is especially good for anyone who didn't study DB theory in college.
"Architecture of a Database System": http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf - By Stonebraker et al. While focused on "classic" DB systems, the ideas readily apply to "Big Data" distributed engines such as Drill. Walks through many of the basic architectural choices. You'll find yourself saying, "I see, Drill chose the shared-nothing, OS thread model but random heap allocation rather than a buffer pool." That is, you can see Drill's design choices in the context of the overall DB solution space. "Database Management Systems", 3e by Ramakrishnan & Gehrke. A textbook-length overview of DB theory. I used the second edition years ago to design and build a complete embedded hybrid DB and object store. I keep returning to the book any time I need a refresher on some topic or other. What other favorites do people have? Anyone know of any good references that explain the rule-based architecture of a planner such as Calcite? (R&G, 2e, mostly discuss the classic "dynamic programming" style of planner.) Thanks, - Paul
