For COPY TO you can try increasing the page timeout or decreasing the page
size:

PAGETIMEOUT=10       - the page timeout in seconds for fetching results
PAGESIZE='1000'          - the page size for fetching results

You can pass these options to the COPY command by adding "WITH
PAGETIMEOUT=1000;", for example.

It will be slower than Spark but to improve performance you can install the
Python driver with Cython extensions as explained in the Setup section of this
blog
<http://www.datastax.com/dev/blog/six-parameters-affecting-cqlsh-copy-from-performance>.
The blog also explains how to compile the copy module itself with Cython.
This is not as important as compiling the driver, and on some versions you
may hit CASSANDRA-11574
<https://issues.apache.org/jira/browse/CASSANDRA-11574>.



On Tue, May 10, 2016 at 6:39 PM, Matthias Niehoff <
matthias.nieh...@codecentric.de> wrote:

> Hi,
>
> already that copy to might not be the best way to do this. I’ll write a
> small spark job.
>
> Thanks
>
> 2016-05-10 10:36 GMT+02:00 Carlos Rolo <r...@pythian.com>:
>
>> Hello,
>>
>> That is a lot of data to do an "COPY TO.
>>
>> If you want a fast way to export, and you're fine with Java, you can use
>> Cassandra SSTableReader classes to read the sstables directly. Spark also
>> works.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
>> *linkedin.com/in/carlosjuzarterolo
>> <http://linkedin.com/in/carlosjuzarterolo>*
>> Mobile: +351 918 918 100
>> www.pythian.com
>>
>> On Tue, May 10, 2016 at 9:33 AM, Matthias Niehoff <
>> matthias.nieh...@codecentric.de> wrote:
>>
>>> sry, sent early..
>>>
>>> more errors:
>>>
>>> /export.cql:9:Error for (4549395184516451179, 4560441269902768904): 
>>> NoHostAvailable - ('Unable to complete the operation against any hosts', 
>>> {<Host: 10.1.8.5 datacenter1>: ConnectionException('Host has been marked 
>>> down or removed',)}) (will try again later attempt 1 of 5)
>>> /export.cql:9:Error for (-2083690356124961461, -2068514534992400755): 
>>> NoHostAvailable - ('Unable to complete the operation against any hosts', 
>>> {}) (will try again later attempt 1 of 5)
>>> /export.cql:9:Error for (-4899866517058128956, -4897773268483324406): 
>>> NoHostAvailable - ('Unable to complete the operation against any hosts', 
>>> {}) (will try again later attempt 1 of 5)
>>> /export.cql:9:Error for (-1435092096023471089, -1434747957681478442): 
>>> NoHostAvailable - ('Unable to complete the operation against any hosts', 
>>> {}) (will try again later attempt 1 of 5)
>>> /export.cql:9:Error for (-2804962318029794069, -2783747272192843127): 
>>> NoHostAvailable - ('Unable to complete the operation against any hosts', 
>>> {}) (will try again later attempt 1 of 5)
>>> /export.cql:9:Error for (-5188633782964403059, -5149722481923709224): 
>>> NoHostAvailable - (‚Unable to complete the operation against any hosts', 
>>> {}) (will try again later attempt 1 of 5)
>>>
>>>
>>>
>>> It looks like the cluster can not handle export and the nodes cannot handle 
>>> the export.
>>>
>>> Is the cqlsh copy able to export this amount of data? or should other 
>>> methods be used (sstableloader, some custom code, spark…)
>>>
>>>
>>> Best regards
>>>
>>>
>>> 2016-05-10 10:29 GMT+02:00 Matthias Niehoff <
>>> matthias.nieh...@codecentric.de>:
>>>
>>>> Hi,
>>>>
>>>> i try to export data of a table (~15GB) using the cqlsh copy to. It
>>>> fails with „no host available“. If I try it with a smaller table everything
>>>> works fine.
>>>>
>>>> The statistics of the big table:
>>>>
>>>>                 SSTable count: 81
>>>>                 Space used (live): 14102945336
>>>>                 Space used (total): 14102945336
>>>>                 Space used by snapshots (total): 62482577
>>>>                 Off heap memory used (total): 16399540
>>>>                 SSTable Compression Ratio: 0.1863544514417909
>>>>                 Number of keys (estimate): 5034845
>>>>                 Memtable cell count: 5590
>>>>                 Memtable data size: 18579542
>>>>                 Memtable off heap memory used: 0
>>>>                 Memtable switch count: 72
>>>>                 Local read count: 0
>>>>                 Local read latency: NaN ms
>>>>                 Local write count: 139878
>>>>                 Local write latency: 0.023 ms
>>>>                 Pending flushes: 0
>>>>                 Bloom filter false positives: 0
>>>>                 Bloom filter false ratio: 0.00000
>>>>                 Bloom filter space used: 6224240
>>>>                 Bloom filter off heap memory used: 6223592
>>>>                 Index summary off heap memory used: 1098860
>>>>                 Compression metadata off heap memory used: 9077088
>>>>                 Compacted partition minimum bytes: 373
>>>>                 Compacted partition maximum bytes: 1358102
>>>>                 Compacted partition mean bytes: 16252
>>>>                 Average live cells per slice (last five minutes): 0.0
>>>>                 Maximum live cells per slice (last five minutes): 0.0
>>>>                 Average tombstones per slice (last five minutes): 0.0
>>>>                 Maximum tombstones per slice (last five minutes): 0.0
>>>>
>>>>
>>>> Some of the errors:
>>>>
>>>> /export.cql:9:Error for (269754647900342974, 272655475232221549): 
>>>> OperationTimedOut - errors={}, last_host=10.1.12.89 (will try again later 
>>>> attempt 1 of 5)
>>>> /export.cql:9:Error for (-3191598516608295217, -3188807168672208162): 
>>>> OperationTimedOut - errors={}, last_host=10.1.12.89 (will try again later 
>>>> attempt 1 of 5)
>>>> /export.cql:9:Error for (-3066009427947359685, -3058745599093267591): 
>>>> OperationTimedOut - errors={}, last_host=10.1.8.5 (will try again later 
>>>> attempt 1 of 5)
>>>> /export.cql:9:Error for (-1737068099173540127, -1716693115263588178): 
>>>> OperationTimedOut - errors={}, last_host=10.1.8.5 (will try again later 
>>>> attempt 1 of 5)
>>>> /export.cql:9:Error for (-655042025062419794, -627527938552757160): 
>>>> OperationTimedOut - errors={}, last_host=10.1.12.89 (will try again later 
>>>> attempt 1 of 5)
>>>> /export.cql:9:Error for (2441403877625910843, 2445504271098651532): 
>>>> OperationTimedOut - errors={}, last_host=10.1.12.89 (permanently given up 
>>>> after 1000 rows and 1 attempts)
>>>>
>>>>
>>>> …
>>>>
>>>>
>>>>
>>>> --
>>>> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
>>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
>>>> 172.1702676
>>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>>>> www.more4fi.de
>>>>
>>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>>>> Schütz
>>>>
>>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und
>>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>>>> E-Mail ist nicht gestattet
>>>>
>>>
>>>
>>>
>>> --
>>> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
>>> 172.1702676
>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>>> www.more4fi.de
>>>
>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>>> Schütz
>>>
>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und
>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>>> E-Mail ist nicht gestattet
>>>
>>
>>
>> --
>>
>>
>>
>>
>
>
> --
> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
> 172.1702676
> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
> www.more4fi.de
>
> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz
>
> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche
> und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige
> Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie
> bitte sofort den Absender und löschen Sie diese E-Mail und evtl.
> beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen
> evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist
> nicht gestattet
>



-- 


[image: datastax_logo.png] <http://www.datastax.com/>

Stefania Alborghetti

Apache Cassandra Software Engineer

|+852 6114 9265| stefania.alborghe...@datastax.com


[image: cassandrasummit.org/Email_Signature]
<http://cassandrasummit.org/Email_Signature>

Reply via email to