[ 
https://issues.apache.org/jira/browse/CASSANDRA-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991279#comment-14991279
 ] 

Stefania edited comment on CASSANDRA-9304 at 11/5/15 9:11 AM:
--------------------------------------------------------------

Thank you for your input. 

Regarding version support for Windows, fine for 2.2+ but for completeness I'll 
point out that the only obstacle left in 2.1 is the name of the file (_cqlsh_ 
-> _cqlsh.py_).

Regarding the problem with pipes, I've replaced pipes with queues so we don't 
need to deal with the low level platform specific details. Queues can also be 
safely used from the callback threads, which was not the case for pipes.

Regarding the problem with the driver, -I haven't tested in 2.2 but I don't 
think it matters which version since- I verified the problem applies to 2.2 as 
well, yesterday I was using the latest cassandra-test driver version, today I 
used 2.7.2. The column type is the same, {{cassandra.cqltypes.BytesType}}, the 
method called from {{recv_result_rows()}} is the same, {{<bound method 
CassandraTypeType.from_binary of <class 'cassandra.cqltypes.BytesType'>>}} but 
{{cls.serialize}} in {{from_binary}} is a lambda for the case that works and 
the default implementation {{CassandraType.deserialize}} for  the case that 
does not work. I don't know where the lambda comes from but I noticed there is 
a cython deserialize for {{BytesType}} in deserializers.pyx. I don't know how 
cython works but if this is picked up in the normal case then the problem is 
again with the way multiprocessing imports modules. 

The problem can be solved by adding a deserialize implementation to BytesType, 
like it's done for other types:

{code}
Stefi@Lila MINGW64 ~/git/cstar/python-driver ((2.7.2))
$ git diff
diff --git a/cassandra/cqltypes.py b/cassandra/cqltypes.py
index f39d28b..eb8d3b6 100644
--- a/cassandra/cqltypes.py
+++ b/cassandra/cqltypes.py
@@ -350,6 +350,10 @@ class BytesType(_CassandraType):
     def serialize(val, protocol_version):
         return six.binary_type(val)

+    @staticmethod
+    def deserialize(byts, protocol_version):
+        return bytearray(byts)
+

 class DecimalType(_CassandraType):
     typename = 'decimal'
{code}

If this is not enough and you want to debug some more [~aholmber], you can use 
the 2.1 patch attached. I'm still working on the 2.2. merge. You need to 
generate a table with a blob, I used cassandra-stress. Then run {{COPY 
<anytable> TO 'anyfile';}} from cqlsh and this should result in a Unicode 
decode error on Windows because the blob is received as a string. If you prefer 
me to test things for you, that works too.



was (Author: stefania):
Thank you for your input. 

Regarding version support for Windows, fine for 2.2+ but for completeness I'll 
point out that the only obstacle left in 2.1 is the name of the file (_cqlsh_ 
-> _cqlsh.py_).

Regarding the problem with pipes, I've replaced pipes with queues so we don't 
need to deal with the low level platform specific details. Queues can also be 
safely used from the callback threads, which was not the case for pipes.

Regarding the problem with the driver, I haven't tested in 2.2 but I don't 
think it matters which version since yesterday I was using the latest 
cassandra-test driver version. Today I used 2.7.2. The column type is the same, 
{{cassandra.cqltypes.BytesType}}, the method called from {{recv_result_rows()}} 
is the same, {{<bound method CassandraTypeType.from_binary of <class 
'cassandra.cqltypes.BytesType'>>}} but {{cls.serialize}} in {{from_binary}} is 
a lambda for the case that works and the default implementation 
{{CassandraType.deserialize}} for  the case that does not work. I don't know 
where the lambda comes from but I noticed there is a cython deserialize for 
{{BytesType}} in deserializers.pyx. I don't know how cython works but if this 
is picked up in the normal case then the problem is again with the way 
multiprocessing imports modules. 

The problem can be solved by adding a deserialize implementation to BytesType, 
like it's done for other types:

{code}
Stefi@Lila MINGW64 ~/git/cstar/python-driver ((2.7.2))
$ git diff
diff --git a/cassandra/cqltypes.py b/cassandra/cqltypes.py
index f39d28b..eb8d3b6 100644
--- a/cassandra/cqltypes.py
+++ b/cassandra/cqltypes.py
@@ -350,6 +350,10 @@ class BytesType(_CassandraType):
     def serialize(val, protocol_version):
         return six.binary_type(val)

+    @staticmethod
+    def deserialize(byts, protocol_version):
+        return bytearray(byts)
+

 class DecimalType(_CassandraType):
     typename = 'decimal'
{code}

If this is not enough and you want to debug some more [~aholmber], you can use 
the 2.1 patch attached. I'm still working on the 2.2. merge. You need to 
generate a table with a blob, I used cassandra-stress. Then run {{COPY 
<anytable> TO 'anyfile';}} from cqlsh and this should result in a Unicode 
decode error on Windows because the blob is received as a string. If you prefer 
me to test things for you, that works too.


> COPY TO improvements
> --------------------
>
>                 Key: CASSANDRA-9304
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9304
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Stefania
>            Priority: Minor
>              Labels: cqlsh
>             Fix For: 3.x, 2.1.x, 2.2.x
>
>
> COPY FROM has gotten a lot of love.  COPY TO not so much.  One obvious 
> improvement could be to parallelize reading and writing (write one page of 
> data while fetching the next).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to