Just noticed you'd sent this to the dev list, this is a question for only
the user list, and please do not send questions of this type to the
developer list.

On Thu, Jan 8, 2015 at 8:33 AM, Ryan Svihla <r...@foundev.pro> wrote:

> The nature of replication factor is such that writes will go wherever
> there is replication. If you're wanting responses to be faster, and not
> involve the REST data center in the spark job for response I suggest using
> a cql driver and LOCAL_ONE or LOCAL_QUORUM consistency level (look at the
> spark cassandra connector here
> https://github.com/datastax/spark-cassandra-connector ) . While write
> traffic will still be replicated to the REST service data center, because
> you do want those results available, you will not be waiting on the remote
> data center to respond "successful".
>
> Final point, bulk loading sends a copy per replica across the wire, so
> lets say you have RF3 in each data center that means bulk loading will send
> out 6 copies from that client at once, with normal mutations via thrift or
> cql writes between data centers go out as 1 copy, then that node will
> forward on to the other replicas. This means intra data center traffic in
> this case would be 3x more with the bulk loader than with using a
> traditional cql or thrift based client.
>
>
>
> On Wed, Jan 7, 2015 at 6:32 PM, Benyi Wang <bewang.t...@gmail.com> wrote:
>
>> I set up two virtual data centers, one for analytics and one for REST
>> service. The analytics data center sits top on Hadoop cluster. I want to
>> bulk load my ETL results into the analytics data center so that the REST
>> service won't have the heavy load. I'm using CQLTableInputFormat in my
>> Spark Application, and I gave the nodes in analytics data center as
>> Intialial address.
>>
>> However, I found my jobs were connecting to the REST service data center.
>>
>> How can I specify the data center?
>>
>
>
>
> --
>
> Thanks,
> Ryan Svihla
>
>


-- 

Thanks,
Ryan Svihla

Reply via email to