[jira] [Commented] (CASSANDRA-9304) COPY TO improvements

Stefania (JIRA) Wed, 02 Sep 2015 19:51:58 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728422#comment-14728422
 ]


Stefania commented on CASSANDRA-9304:
-------------------------------------

The {{RateLimiter}} still seems a bit off. It looked kind of wrong before as 
you pointed out. It's not terribly important but I think this line 
{{self.current_rate = (self.current_rate + new_rate) / 2.0}} was meant as an 
average between the current rate and the new one. So the first time, when 
{{current_rate}} is zero, it should not divide by 2 or else we report half the 
rate. Secondly,  when we calculate the new rate as {{n / difference}}, we may 
miss records because {{n}} is the number of records passed to every call whilst 
{{difference}} is the time elapsed since the last time we logged. I wouldn't 
calculate the rate every time either, but only when logging it. If 
{{current_record}} cannot be reset to zero after logging it (maybe this was the 
initial intention of the existing code), then we need a new counter which gives 
the number of records accumulated between each log point.

It's great we now test for all partitioners but we are only exporting 1 record 
in {{test_all_datatypes_round_trip}} so a better candidate would have been 
{{test_round_trip}}, where at least we export 10K records. So would you mind 
adapting {{test_round_trip}} to also run with every partitioner?

In fact it would be good to have a bulk round-trip test as well (only for the 
default partitioner) where we export and import 1M records? We would need to 
use cassandra stress to write the records. Then we just check the counts. This 
is just a suggestion.

I had problems when running the cqlsh_tests locally:

{code}
nosetests -s cqlsh_tests
{code}

{code}
======================================================================
ERROR: test_source_glass (cqlsh_tests.cqlsh_tests.TestCqlsh)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/stefania/git/cstar/cassandra-dtest/tools.py", line 252, in wrapped
    f(obj)
  File "/home/stefania/git/cstar/cassandra-dtest/cqlsh_tests/cqlsh_tests.py", 
line 341, in test_source_glass
    self.verify_glass(node1)
  File "/home/stefania/git/cstar/cassandra-dtest/cqlsh_tests/cqlsh_tests.py", 
line 102, in verify_glass
    'I can eat glass and it does not hurt me': 'Is'
  File "/home/stefania/git/cstar/cassandra-dtest/cqlsh_tests/cqlsh_tests.py", 
line 95, in verify_varcharmap
    got = {k.encode("utf-8"): v for k, v in rows[0][0].iteritems()}
IndexError: list index out of range
-------------------- >> begin captured logging << --------------------
dtest: DEBUG: cluster ccm directory: /tmp/dtest-Ldxvcq
--------------------- >> end captured logging << ---------------------

======================================================================
FAIL: test_all_datatypes_read (cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/home/stefania/git/cstar/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", 
line 690, in test_all_datatypes_read
    self.assertCsvResultEqual(self.tempfile.name, results)
  File 
"/home/stefania/git/cstar/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", 
line 153, in assertCsvResultEqual
    raise e
AssertionError: Element counts were not equal:
First has 1, Second has 0:  ['ascii', '1099511627776', '0xbeef', 'True', 
'3.140000000000000124344978758017532527446746826171875', '2.444', '1.1', 
'127.0.0.1', '25', 
'\xe3\x83\xbd(\xc2\xb4\xe3\x83\xbc\xef\xbd\x80)\xe3\x83\x8e', '2005-07-14 
12:30:00', '2b4e32ce-51de-11e5-85b7-0050b67e8b2f', 
'830bc4cd-a790-4ac2-85f9-648b0a71306b', 'asdf', '36893488147419103232']
First has 0, Second has 1:  ['ascii', '1099511627776', '0xbeef', 'True', 
'3.140000000000000124344978758017532527446746826171875', '2.444', '1.1', 
'127.0.0.1', '25', 
'\xe3\x83\xbd(\xc2\xb4\xe3\x83\xbc\xef\xbd\x80)\xe3\x83\x8e', '2005-07-14 
04:30:00', '2b4e32ce-51de-11e5-85b7-0050b67e8b2f', 
'830bc4cd-a790-4ac2-85f9-648b0a71306b', 'asdf', '36893488147419103232']
-------------------- >> begin captured logging << --------------------
dtest: DEBUG: cluster ccm directory: /tmp/dtest-cSohP9
dtest: DEBUG: Importing from csv file: /tmp/tmpJgdPJc
dtest: WARNING: Mismatch at index: 10
dtest: WARNING: Value in csv: 2005-07-14 12:30:00
dtest: WARNING: Value in result: 2005-07-14 04:30:00
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 69 tests in 1161.775s

FAILED (SKIP=5, errors=1, failures=1)
{code}

I scheduled new CI jobs on my view:

http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9304-testall/
http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9304-dtest/

Let's see if they too report the problems I had locally.

> COPY TO improvements
> --------------------
>
>                 Key: CASSANDRA-9304
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9304
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Kua
>            Priority: Minor
>              Labels: cqlsh
>             Fix For: 2.1.x
>
>
> COPY FROM has gotten a lot of love.  COPY TO not so much.  One obvious 
> improvement could be to parallelize reading and writing (write one page of 
> data while fetching the next).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9304) COPY TO improvements

Reply via email to