Re: bulk insert

Mike Matrigali Mon, 30 Aug 2010 14:27:45 -0700

Rick Hillegas wrote:

Mike Matrigali wrote:
Rick Hillegas wrote:
Mike Matrigali wrote:
Rick Hillegas wrote:
Hi Mike,
Thanks for the quick response. Some comments inline...

Mike Matrigali wrote:
I would vote -1 on this, if the proposal is to allow unlogged inserts
into non-empty tables.  I do not want to add syntax that basically
allows users to silent corrupt their db.
I poorly described what I meant. I am not suggesting that weinvent a new mechanism for bulk insert. I'm merely suggesting thatwe re-use the existing bulk insert mechanism used by the importprocedures. When I said that logging would be turned off, I onlymeant that it would be turned off in the way that it is for theimport procedures. This is my understanding of how that existinglogic works:
o The import procedure cooks up an INSERT statement whichspecifies"insertMode-bulkInsert". At execution time, if that INSERTstatement specifies "insertMode=bulkInsert" AND the table is empty,then a new conglomerate is created on the side and the inserts gointo that new conglomerate. As you note, the conglomerate creationis logged and the old conglomerate is replaced with the newconglomerate only if the insert succeeds.
Ok, I don't have a problem if the same mechanism with same existing
behavior is being used, the existing behaviors are safe and should
not lead to corruptions or any need by customer to recover themselves.
Great!
If the feature is provided we should note the extra overhead that this
may cause an insert, just so someone doesn't put this on all of their
inserts.  There is overhead for the system to check if the table is
empty before doing the insert. For import it seems obvious thatuser has gone to trouble to use a different command so likely aempty table
check is not much.  But on a insert statement it is not as obvious.
The worst case for the check is a table that had a large amount of data
that has all been deleted. The empty check might require reading alarge number of empty pages - depending on what kind of spacereclamation has gone on. I think that is also a benefit of the replace
option, we don't need to do the check.
Thanks. I'll add this to the documentation for this capability.
I am not sure the internal syntax is the best way to go. I believeit was originally eliminated because it was non standard. Maybesome new
syntax would be more appropriate.
Thanks for that piece of history. I wondered why the feature wasdisabled. Right now the syntax is the properties extension which wealso use for optimizer overrides. I'm not aware of any standardlanguage for this. We could introduce two non-reserved keywords forthis: BULK and REPLACE. How does the following sound:
insert [ bulk | replace ] into  ...
I prefer the suggested syntax over the properties syntax, but don't like
adding nonstandard syntax to derby. Especially to basic sql likeinsert. The replace option is a huge diversion from the behavior
standard also.
The standard does allow vendors to extend SQL provided that theextensions do not conflict with language approved by the SQL committee.There are many examples of Derby extensions, including many which werepresent when Derby was open-sourced originally. I agree that we shouldbe cautious about introducing extensions. I also agree that thisproposed syntax troubles me less than the optimizer overrides do.
When we were packaging up Derby as a standards based db for apache
we pulled out whatever nonstandard syntax/behavior we could.  When there
was a feature we wanted to provide where we could find no standard,system procedures were used.Putting the non-standard behavior into procedures makes it veryobvious to users that the functionality is non-standard. If someonewants to port their application from derby to some other db, it is asimple rule of thumb to not include system procedure calls. As youpointed out this was not done for optimizer hints
which didn't lend itself well to a procedure based approach.  There we
hid the syntax in comments so we weren't really touching the "real"
standard syntax.  It is kind of a hack, so really would not use it as
a model for adding more features to existing standard syntax.

Going back to why you said users wanted this feature.  Would users get
the functionality they are looking for if we added import procedureswhich took an SQL table as input rather than a file. Wherethe SQL table could also be a table function now that they aresupported. I think we would have added this import procedureoriginally if table functions had existed at the time.
Table functions can take arguments, including ? parameters andarbitrarily complicated expressions. I don't think these would be easyto model with system procedures.

Without ? parameters it seems straight forward.  You create the table
function outside of the procedure and give it a name.  Then you pass
that name into the import procedure.

I was hoping that we could just pass in either the table type or the
a result set into the sytem procedure and then it would solve all the
issues.  But I see you can only pass data types into system procedures.

Are there any new standards that would allow ResultSets to be passed
to table functions?  Do user defined types help at all?


Thanks,
-Rick


Thanks,
-Rick

Re: bulk insert

Reply via email to