Hi Thomas,
Thanks for taking the time to look into this.
I haven't been able to create a simple reproducible test case with
generated data. What follows is the closest I could get.
I am sure the problem I had with the queries running against the data
in my original post would have completed eventually, but it didn't
seem right that they should long for such a long time. So I wanted to
know if that is a known problem and if there are workarounds like the
proposed MERGE solution.
With my original data I also tried adding an index to the source WS
table, but even that seemed to take a very long time. So the MERGE
approach was the only thing that worked for me in the end.
DROP ALL OBJECTS;
CREATE TABLE WS(VAL VARCHAR) AS
SELECT HASH('SHA256', MOD(X, 1300000), 1) || HASH('SHA256', MOD(X, 10), 1) VAL
FROM SYSTEM_RANGE(1, 1500000);
SELECT COUNT(VAL), COUNT(DISTINCT VAL) FROM WS;
CREATE TABLE WS2(VAL VARCHAR);
CREATE UNIQUE INDEX IDX_WS2_VAL ON WS2(VAL);
EXPLAIN ANALYZE
INSERT INTO WS2(VAL) DIRECT SELECT DISTINCT(VAL) FROM WS;
TRUNCATE TABLE WS2;
EXPLAIN ANALYZE
MERGE INTO WS2(VAL) KEY(VAL) SELECT VAL FROM WS;
Please let me know if you want to pursue this any further. Then I will
try to make the original data available as a test case. The only
difference in the original data I can think of is that many values
share a very long common prefix.
Regards,
Knut Wannheden
On Sun, Dec 18, 2011 at 18:55, Thomas Mueller
<[email protected]> wrote:
> Hi,
>
> I can't reproduce the problem, could you post a reproducible test case
> please? My test case is:
>
> drop all objects;
> CREATE TABLE WS(VAL VARCHAR) as select cast(x*0.8 as int) from
> system_range(1, 1000000);
> create index idx_ws_val on ws(val);
> select top 100 * from ws;
> SELECT COUNT(VAL), COUNT(DISTINCT VAL) FROM WS;
> CREATE TABLE WS2(VAL VARCHAR) AS SELECT DISTINCT(VAL) FROM WS;
>
> (the query "SELECT COUNT(VAL), COUNT(DISTINCT VAL) FROM WS" actually
> ran out of memory because the values are kept fully in memory, which
> isn't nice, but I don't plan to fix that currently)
>
> It worked with and without the index. I agree the last statement is a
> bit slow (45 seconds), but I don't plan to analyze it currently (it
> doesn't look like a very serious problem to me). If you want, please
> go ahead. I guess it has to do with a temporary result set. This here
> is much faster:
>
> drop all objects;
> CREATE TABLE WS(VAL VARCHAR) as select cast(x*0.8 as int) from
> system_range(1, 1000000);
> CREATE TABLE WS2(VAL VARCHAR);
> insert into ws2 SELECT DISTINCT(VAL) FROM WS;
>
> Regards,
> Thomas
>
> --
> You received this message because you are subscribed to the Google Groups "H2
> Database" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/h2-database?hl=en.
>
--
You received this message because you are subscribed to the Google Groups "H2
Database" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/h2-database?hl=en.