A few clarifications: - Presto supports hash-based distributed joins as well as broadcast joins
- Presto metadata is stored in ZooKeeper, but metadata storage is pluggable and could be stored in Accumulo instead - The connector does use tablet locality when scanning Accumulo, but our testing has shown you get better performance by giving Accumulo and Presto their own dedicated machines, making locality a moot point. This will certainly change based on types of queries, data sizes, network quality, etc. - You can insert the results of a query into a Presto table using INSERT INTO foo SELECT ..., as well as create a table from the results of a query (CTAS). Though, for large inserts, it is typically best to bypass the Presto layer and insert directly into the Accumulo tables using the PrestoBatchWriter API Cheers, --Adam On Mon, Jun 13, 2016 at 7:20 AM, Christopher <[email protected]> wrote: > Thanks for that summary, Dylan! Very helpful. > > On Mon, Jun 13, 2016, 01:36 Dylan Hutchison <[email protected]> > wrote: > > > Thanks for sharing Sean. Here are some notes I wrote after reading the > > article on Presto-Accumulo design. I have a research interest in the > > relationship between relational (SQL) and non-relational (Accumulo) > > systems, so I couldn't resist reading the post in detail. > > > > - Places the primary key in the Accumulo row. > > - Performs row-at-a-time processing (each tuple is one row in > > Accumulo) using WholeRowIterator behavior. > > - Relational table metadata is stored in the Presto infrastructure (as > > opposed to an Accumulo table). > > - Supports the creation of index tables for any attributes. These > > index tables speed up queries that filter on indexed attributes. It > is > > standard secondary indexing, which provides speedups when the > selectivity > > of the query is roughly <10% of the original table. > > - Only database->client querying is supported. You cannot run "select > > ... into result_table". > > - As far as I can see, Presto only has one join strategy: *broadcast > > join*. The right table of every join is scanned into one of the > > Presto worker's memory. Subsequently the size of the right table is > > limited by worker memory. > > - There is one Presto worker for each Accumulo tablet, which enables > > good scaling. > > - The Presto bridge classes track internal Accumulo information such > > as the assignment of tablets to tablet servers by reading Accumulo's > > Metadata table. Presto uses tablet locations to provide better > locality. > > - The Presto bridge comes with several Accumulo server-side iterators > > for filtering and aggregating. > > - The code is quite nice and clean. > > > > This image below gives Presto's architecture. Accumulo takes the role of > > the DB icon in the bottom-right corner. > > > > [image: Inline image 2] > > > > Bloomberg ran 13 out of the 22 TPC-H queries. There is no fundamental > > reason why they cannot run all the queries; they just have not > implemented > > everything required ('exists' clauses, non-equi join, etc.). > > > > The interface looks like this, though they use a compiled java jar to > > insert entries from a csv file (it wraps around a BatchWriter). > > > > [image: Inline image 3] > > > > Here are performance results. They don't say what hardware or data sizes > > they use. Whatever it is, they must have the ability to fit the smaller > > table of any join into memory as a result of Presto's broadcast join > > strategy. The strong scaling looks very nice. > > > > [image: Inline image 4] > > > > They have one other plot that shows how secondary indexing speeds up some > > queries with low selectivity. > > > > Cheers, Dylan > > > > > > > > On Sun, Jun 12, 2016 at 7:06 PM, Sean Busbey <[email protected]> > wrote: > > > >> Bloomberg have a post about a connector they made to query Accumulo from > >> Presto: > >> > >> > >> > http://www.bloomberg.com/company/announcements/open-source-at-bloomberg-reducing-application-development-time-via-presto-accumulo/ > >> > >> -- > >> Sean Busbey > >> > > > > >
