I don't think any real locking is required. The filesystem already provides atomic operations. We can simply use optimistic locking and the existing atomic delete and create along with a numbering scheme similar to lucene's index versioning scheme.
With regards to inserts, I don't see why we wouldn't allow many simultaneous inserts. On Aug 4, 2015 11:19 PM, "Aman Sinha" <[email protected]> wrote: > Some thoughts around concurrency for INSERTs and SELECTs: > > For the consistency model we could aim to provide something close to > snapshot-isolation. > Only 1 INSERT to a particular table is allowed at a time. > SELECT queries happen concurrently while the INSERT is going on, but in > reality there is a finer-granularity of steps: > > For INSERT operation: > 1. create a temp directory (or hidden directory) to store the files - > say foo/tmpdir for the table foo > 2. start writing files to foo/tmpdir > 3. when complete, rename foo/tmpdir to indicate that it is available for > queries. <--- Q: is there a need for a short term directory level locking > ? > 4. call REFRESH METADATA that does the merging of the existing schema > with the new files and collects stats on the new files > 4(a). During this process, a metadata write lock needs to be > acquired > > For any query that is running concurrently with the INSERT, we should only > block for step 4(a). This is a relatively short period of time > compared to the overall elapsed time of the INSERT. > > The SELECT queries themselves have following steps: > 1. Query planning > 1(a) Read the table metadata (presumably from the metadata file) by > first acquiring a read lock on the metadata, reading the metadata > and releasing the lock. > 1(b) Perform query optimization > 2. Query execution and return results > > The only blocking step is 1(a). > > Q: is it possible to completely avoid any locks and what would be the > consistency model that we can claim ? We are not building transactional > system > but do need to have a well-defined deterministic behavior in the > presence of concurrency. > Q: if we do acquire locks, what should be the nature of these...are they > persistent etc. > > Aman > > On Thu, Jul 30, 2015 at 9:28 AM, Aman Sinha <[email protected]> wrote: > > > Two more observations: > > 1. I would think the merged schema should be deterministic and not > depend > > on the order of scanning the files. How can > > this be enforced ? Timestamp order, lexicographical order ? .. but > > these are being read in parallel, so we have to think > > through this.. > > 2. Due to merging of schemas, a column that was of REQUIRED type in the > > original parquet file may end up as having > > NULL values after merging. If we do a CTAS of the merged schema, > > this column will be OPTIONAL type in parquet. > > Is this acceptable ? > > > > On Wed, Jul 29, 2015 at 7:42 PM, Jacques Nadeau <[email protected]> > > wrote: > > > >> One comment... I think this is an overreach: > >> > >> "The union schema is saved in a .drill file" > >> > >> There is no point in saving something derived in a .drill file. Maybe a > >> cache file (extending the other work on metadata caching). The idea for > >> .drill files is that they are supplementary to what can be acquired from > >> the data directly. > >> > >> -- > >> Jacques Nadeau > >> CTO and Co-Founder, Dremio > >> > >> On Wed, Jul 29, 2015 at 6:58 PM, Parth Chandra <[email protected]> > wrote: > >> > >> > Notes based on discussion in hangout, JAcques' notes and later > >> discussion: > >> > > >> > > >> > 1. For Parquet files > >> > - Merge schema information into a union schema > >> > - The union schema is saved in a .drill file > >> > - The union schema can be cached > >> > - Union schema should maintain ordinal position of fields. > >> > 2. At the time the table is read, we can check directory timestamp > >> and > >> > regenerate merged schema > >> > 3. Insert into does not need to update the merged schema. (It > should > >> > check if there is an incompatible schema being written? An > >> incompatible > >> > schema is one that cannot be merged successfully. ) > >> > - Suggestion 1 - Allow flexible insert. (Issue a warning if there > >> is > >> > going to be a schema change conflict). Queries on columns with > >> > no conflict > >> > can still run. > >> > - Suggestion 2 - Do not allow insert unless the schema can be > >> > merged. > >> > 4. Rules for merging can be relaxed as we improve our ability to > >> promote > >> > from one type to another > >> > 5. Concept of schema change different from a merged schema. Schema > >> > change is needed for reading schema less sources like JSON. > >> > 6. If there is a different column order (or difference in number of > >> > columns) in the table files, an insert based on ordinal positions > >> > cannot be > >> > allowed. > >> > 7. Suggestion - go with a naming based insert. Ordinal based insert > >> can > >> > be enabled based on a switch (?) > >> > 8. Reuse partitioning information from 'create table' in 'insert > >> into'. > >> > No partitioning info is needed in the 'insert into' statement. > >> > > >> > > >> > > >> > On Mon, Jul 27, 2015 at 6:43 PM, Khurram Faraaz <[email protected] > > > >> > wrote: > >> > > >> > > I have a question related to insert into partitioned tables. I see > >> that > >> > > DB2, Oracle, and Postgres do not support a PARTITION BY clause in > >> their > >> > > INSERT INTO implementation. MySQL does however provide support for a > >> > > PARTITION BY clause in their INSERT INTO implementation. I do not > know > >> > what > >> > > the standard ANSI SQL specification says about INSERT INTO > statement. > >> We > >> > > may want to keep it compliant with standard SQL syntax. > >> > > > >> > > >> B ) Support insert into a partitioned table > >> > > >> ---------------------- > >> > > > >> > > >> Ex: INSERT INTO T1 col1, col2,col3 partition by col1,col2 SELECT > >> > > col4,col,col6 from T2 > >> > > > >> > > >> * Target column specification is required when inserting data > into > >> an > >> > > >> already partitioned table > >> > > >> * Requirements A.3-A.7 above apply for insert into partitioned > >> tables > >> > > >> as well > >> > > >> * A partition by clause along with one or more columns is > required > >> > > >> * All the columns specified in partition by clause must exist in > >> the > >> > > >> target column list > >> > > >> * Partition by columns specified do not need to match to the list > >> of > >> > > >> columns that the original table partitioned with (i.e if the > >> > > >> original table is partitioned with col1, col2, new data during > >> > > >> insert can be partitioned by col3 or just with col1 or col2..) > >> > > > >> > > On Mon, Jul 27, 2015 at 5:40 PM, Mehant Baid <[email protected] > > > >> > > wrote: > >> > > > >> > > > I wanted to start a conversation around supporting the "Insert > into > >> > > Table" > >> > > > feature. As of 1.2 we initially want to support inserting into a > >> table > >> > > with > >> > > > Parquet files. Support for Json, CSV and other sources will follow > >> as > >> > > > future enhancements. > >> > > > > >> > > > Aman, Jinfeng, Neeraja and I had an initial discussion about this > >> and > >> > > > Neeraja provided a good summary of our discussion (pasted below) > >> also > >> > > > stating some of the requirements for this feature. > >> > > > > >> > > > A ) Support Insert into a non-partitioned table > >> > > > --------------------- > >> > > > > >> > > > Ex: INSERT INTO T1 [col1, col2, col3] SELECT col4, col5, col6 > from > >> T2 > >> > > > (Source table: T2, Target table T1) > >> > > > Requirements: > >> > > > > >> > > > 1. Target table column list specification is optional for Insert > >> > > statement > >> > > > 2. When specified, the column list in the Insert statement should > >> > > > contain all the columns present in the target table (i.e No > >> support > >> > > > for partial insert) > >> > > > 3. The column names specified for the source table do not need to > >> match > >> > > > to the target table column names. Match is performed based on > >> > ordinal. > >> > > > 4. # of Source table columns specified must be same as # of > target > >> > > > table columns > >> > > > 5. Types of specified source table columns must match to the types > >> of > >> > > > target table columns > >> > > > 6. Specification of * is not allowed in the Select table syntax > >> > > > 7. Select table syntax can specify constant values for one or more > >> > > columns > >> > > > > >> > > > > >> > > > B ) Support insert into a partitioned table > >> > > > ---------------------- > >> > > > > >> > > > Ex: INSERT INTO T1 col1, col2,col3 partition by col1,col2 SELECT > >> > > > col4,col,col6 from T2 > >> > > > > >> > > > * Target column specification is required when inserting data > into > >> an > >> > > > already partitioned table > >> > > > * Requirements A.3-A.7 above apply for insert into partitioned > >> tables > >> > > > as well > >> > > > * A partition by clause along with one or more columns is > required > >> > > > * All the columns specified in partition by clause must exist in > >> the > >> > > > target column list > >> > > > * Partition by columns specified do not need to match to the list > >> of > >> > > > columns that the original table partitioned with (i.e if the > >> > > > original table is partitioned with col1, col2, new data during > >> > > > insert can be partitioned by col3 or just with col1 or col2..) > >> > > > > >> > > > > >> > > > Couple of open questions from the design perspective are > >> > > > > >> > > > 1. How do we perform validation. Validation of data types, number > of > >> > > > columns being inserted etc. In addition to validation we need to > >> make > >> > > sure > >> > > > that when we insert into an existing tables we insert data with > the > >> > > > existing column names (select column list can have different > names). > >> > This > >> > > > poses problems around needing to know the metadata at planning > time, > >> > two > >> > > > approaches that have been floating around are > >> > > > * DotDrill files: We can store metadata, partitioning > >> columns > >> > and > >> > > > other useful information here and we can perform validation during > >> > > planning > >> > > > time. However the challenges with introducing DotDrill files > include > >> > > > - consistency between metadata and the actual data > >> > (Nothing > >> > > > preventing users to copy files directly). > >> > > > - security around DotDrill files (can be dealt in the > >> same > >> > > > way we perform security checks for drill tables in hdfs) > >> > > > - interface to change the DotDrill file, in the case > we > >> > need > >> > > > to add a column to the table or add a new partition etc. > >> > > > > >> > > > * Explicit Syntax/ No metadata approach: Another approach > >> is to > >> > > > avoid DotDrill files and use explicit syntax to glean as much > >> > information > >> > > > as possible from the SQL statement itself. Some of the challenges > >> with > >> > > this > >> > > > approach are > >> > > > - Gathering metadata information: Since we have no > idea > >> > what > >> > > > the existing schema is we would need to perform a "mini scan" to > >> learn > >> > > the > >> > > > schema at planning time to be able to perform some validation. The > >> > > problem > >> > > > with this approach is how do we determine how many files we need > to > >> > read > >> > > in > >> > > > order to learn the schema? If we use a sample set and not all the > >> files > >> > > > have the same schema, > >> > > > we could have non-deterministic results based on > the > >> > > > sample of files read. Also reading all the files and merging the > >> schema > >> > > > seems like an expensive cost to pay. > >> > > > - From the user's perspective, while inserting into a > >> > > > partitioned table, user will have to specify the partitioning > >> columns > >> > > again > >> > > > in the Insert statement, despite having specified the partition > >> columns > >> > > in > >> > > > the CTAS. > >> > > > > >> > > > 2. What is a reasonable assumption for a Drill table in terms of > >> > changing > >> > > > schema. Having the same exact schema for all files in a table is > too > >> > > rigid > >> > > > an assumption at this point? > >> > > > > >> > > > One thing to remember with DotDrill file is to also the > >> repercussions > >> > on > >> > > > Drop table, Show tables, Describe table etc. i.e. it might make it > >> > easier > >> > > > to be able to support these operations. > >> > > > > >> > > > Thanks > >> > > > Mehant > >> > > > > >> > > > >> > > >> > > > > >
