Re: [Monetdb-developers] Performance of COPY INTO: data source & constraint checking

Sjoerd Mullender Fri, 24 Jul 2009 05:31:18 -0700

Roberto Cornacchia wrote:
> I come back to this issue with some more interesting results.
> 
> I did mention in the past that 'COPY INTO ... FROM stdin ..' seems to be
> a lot slower than 'COPY INTO ... FROM <filename> ...'
> 
> Actually, I have just found something that may shift the problem away
> from the stdin itself.
> 
> Look at the following two methods of restoring a previously dumped
> (rather small) database. 
> 
> RESTORE METHOD 1 (far too slow):
> 
> $ mclient < backup.sql
> [ 62  ]
> [ 70199       ]
> [ 288 ]
> [ 70322       ]
> [ 69  ]
> [ 423 ]
> [ 280596      ]
> [ 4302        ]
> [ 49038       ]
> [ 1   ]
> [ 10  ]
> [ 6943        ]
> [ 6943        ]
> [ 3424        ]
> [ 3424        ]
> [ 3424        ]
> [ 1   ]
> [ 6943        ]
> [ 3424        ]
> TIME elapsed: 68 seconds
> 
> 
> RESTORE METHOD 2:
> 
> $ echo "\< backup.sql" | mclient
> [ 62  ]
> [ 70199       ]
> [ 288 ]
> [ 70322       ]
> [ 69  ]
> [ 423 ]
> [ 280596      ]
> [ 4302        ]
> [ 49038       ]
> [ 1   ]
> [ 10  ]
> [ 6943        ]
> [ 6943        ]
> [ 3424        ]
> [ 3424        ]
> [ 3424        ]
> [ 1   ]
> [ 6943        ]
> [ 3424        ]
> TIME elapsed: 1 seconds
> 
> Can anyone explain this difference?


When mclient reads from standard input, it reads a line at a time.  When
it reads from file (either using the \< construct or as a command line
argument (i.e. without <)) it reads in large chunks.

> One further consideration: Method 2 works fine, but cannot be used to
> load gzipped dumps. Maybe this can be a feature request: '\<' should
> accept gzipped input.

Patches are gratefully accepted.  ;-)

> Roberto
> 
> 
> On Mon, 2009-05-18 at 13:03 +0200, Stefan Manegold wrote:
>> On Mon, May 18, 2009 at 11:57:57AM +0200, Stefan de Konink wrote:
>>> On Mon, 18 May 2009, Stefan Manegold wrote:
>>>
>>>> In case you came to your conclusion from experimental/emperical study, I
>>>> would be very interested to know about the experiemntes performed and the
>>>> results measured.
>>>> In that case, I would be verythankful if you could share the informative
>>>> details with us, i.e. detailed descriptions of
>>>> - experimental setup (HW, SW, datasets, workload/queries)
>>>> - measured results (what was measured how, and what are the results?)
>>>> - (your) discussion/analysis/interpretation of the results
>>> I thought the MADAM meeting was for this.
>> Ok. fine.
>>
>> ... we should then revive the ideas of keeping note of MADAMs (e.g., on a
>> Wiki or alike) as reference for those who cannot be present ...
>>
>>>> Could you please elaborate on whether you consider you setup/schema
>>>> "representaive" and/or whether/to which extend you think that your results
>>>> for one specific case could be generalized to other cases?
>>> Technically the only thing I have to prove to get my code reverted is that
>>> there are cases where late contraints hurt performance.
>> IMHO, our task should be the following:
>>
>> - analyze whether there is indeed a significat performance difference
>>   between early and late constraint checking;
>>   (5% do not justify any further time to be spent on this)
>> - and if so, analyze under which circumstances which altertative performs
>>   (significatly) better or worse than the other;
>> - and (only) then try to find out where the (significat) performance
>>   difference comes from
>> - and (only) then 
>>   + either try to eliminate the difference;
>>   + or 
>>     * make dump/restore use the most suitable alternative (as far as
>>       possible with reasonable effort);
>>     * and document our experiences to give adivce for users
>>
>> Stefan
>>
>>> Stefan
>>>


-- 
Sjoerd Mullender

signature.asc
Description: OpenPGP digital signature

------------------------------------------------------------------------------

_______________________________________________
Monetdb-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-developers

Re: [Monetdb-developers] Performance of COPY INTO: data source & constraint checking

Reply via email to