The description I sent is for the planner but there's of course a run-time
component which would consist of a 'RecordWriter'  for the underlying DB.
 In case of MapR-DB, this RecordWriter would simply call the underlying PUT
or the Bulk PUT API.   In addition, we need to figure out the tablet/region
affinity.  For single row inserts, doing a remote write from the foreman
node may be okay but for INSERT - SELECT type of operations where the
SELECT side is producing millions of rows and it has already been
parallelized, these rows need to be inserted through a parallel bulk insert
.. so we would want to range-partition the rows based on the  tablet rowid
ranges  such that rows belonging to the same tablet are somewhat 'grouped
together'  and 2 minor fragments in Drill don't try to write to the same
tablet.

Aman


On Tue, May 28, 2019 at 12:50 PM Aman Sinha <[email protected]> wrote:

> Yes, Calcite already supports the INSERT/UPSERT syntax.  Within Drill, you
> would need to 'unblock' this syntax (not all of it but whatever variation
> we may want to support). You can take a look at DrillParserImpl.java
> (SqlInsert() method) which is actually a generated file from JavaCC.
>
> We would need to look at the Calcite logical plan that is created for the
> DML statements such as this and then determine the corresponding Drill
> logical/physical plan.  Since I haven't seen a Calcite logical plan with
> DML operators yet, I am not completely sure but if it follows the standard
> logical plan, then in Drill we need the following:
>
>   - since this would only be supported for a specific storage/format
> plugin, there should be an early validation check of data source (in the
> FROM clause) to ensure if it qualifies
>   - a logical rule that converts the Calcite logical plan node to Drill
> logical plan node (for example, see *DrillProjectRule*.java)
>   - a logical rel that represents the plan node .. e.g DrillInsertRel
> (for example see existing *DrillProjectRel*).
>   - a physical rule  (e.g see *ProjectPrule*)
>   - a physical rel (e.g see *ProjectPrel*)
>   - optimizer rules for any plugin specific pushdown are implemented
> within the plugin and added to the list of rules for that plugin (e.g see
> *MapRDBFormatPlugin.getOptimizerRules()*).  These are then automatically
> picked up by Drill.
>
> Aman
>
>
> On Mon, May 27, 2019 at 11:12 PM Ted Dunning <[email protected]>
> wrote:
>
>> Yes. CTAS should be a similar problem to unsafe inserts.
>>
>> We have a few people interested in the work. What is needed more is
>> pointers to where to find out about the details.
>>
>> 1. How can we enable the syntax?
>>
>> 2. What operators are really necessary?
>>
>> 3. How should writers inject insert optimizer rules to allow insert or
>> update operator pushdown?
>>
>>
>>
>> On Mon, May 27, 2019 at 9:42 PM Paul Rogers <[email protected]>
>> wrote:
>>
>> > Hi Ted,
>> >
>> > Drill can do a CTAS today, which uses a writer provided by the format
>> > plugin. One would think this same structure could work for an INSERT
>> > operation, with a writer provided by the storage plugin. The devil, of
>> > course, is always in the details. And in finding resources to do the
>> work...
>> >
>> > Thanks,
>> > - Paul
>> >
>> >
>> >
>> >     On Monday, May 27, 2019, 5:28:27 PM PDT, Ted Dunning <
>> > [email protected]> wrote:
>> >
>> >  I have in mind the ability to push rows to an underlying DB without any
>> > transactional support.
>> >
>> >
>> >
>> >
>> >
>>
>

Reply via email to