[jira] [Commented] (CASSANDRA-13269) Snapshot support for custom secondary indices

2017-02-25 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884575#comment-15884575
 ] 

Jeff Jirsa commented on CASSANDRA-13269:


Any of you other custom-secondary-index folks ( [~jjordan] / [~iamaleksey] , or 
[~adelapena] ) eager to review? 


> Snapshot support for custom secondary indices
> -
>
> Key: CASSANDRA-13269
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13269
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: vincent royer
>Priority: Trivial
>  Labels: features
> Fix For: 3.0.12, 3.11.0
>
> Attachments: 0001-CASSANDRA-13269-custom-indices-snapshot.patch
>
>
> Enhance the index API to support snapshot of custom secondary indices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13270) Add function hooks to deliver Elasticsearch as a Cassandra plugin

2017-02-25 Thread vincent royer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vincent royer updated CASSANDRA-13270:
--
Attachment: 0001-CASSANDRA-13270-elasticsearch-as-a-plugin.patch

> Add function hooks to deliver Elasticsearch as a Cassandra plugin
> -
>
> Key: CASSANDRA-13270
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13270
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: vincent royer
>Priority: Minor
>  Labels: features
> Fix For: 3.0.12, 3.11.0
>
> Attachments: 0001-CASSANDRA-13270-elasticsearch-as-a-plugin.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> With these basic modifications (see the patch) and the following tickets, the 
> Elassandra project (see https://github.com/strapdata/elassandra) could be an 
> Elasticsearch plugin for Cassandra.
> * CASSANDRA-12837 Add multi-threaded support to nodetool rebuild_index.
> * CASSANDRA-13267 Add CQL functions.
> * CASSANDRA-13268 Allow to create custom secondary index on static columns.
> * CASSANDRA-13269 Snapshot support for custom secondary indices



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13270) Add function hooks to deliver Elasticsearch as a Cassandra plugin

2017-02-25 Thread vincent royer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vincent royer updated CASSANDRA-13270:
--
Reproduced In: 3.0.11
   Status: Patch Available  (was: Open)

See attached patch

> Add function hooks to deliver Elasticsearch as a Cassandra plugin
> -
>
> Key: CASSANDRA-13270
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13270
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: vincent royer
>Priority: Minor
>  Labels: features
> Fix For: 3.0.12, 3.11.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> With these basic modifications (see the patch) and the following tickets, the 
> Elassandra project (see https://github.com/strapdata/elassandra) could be an 
> Elasticsearch plugin for Cassandra.
> * CASSANDRA-12837 Add multi-threaded support to nodetool rebuild_index.
> * CASSANDRA-13267 Add CQL functions.
> * CASSANDRA-13268 Allow to create custom secondary index on static columns.
> * CASSANDRA-13269 Snapshot support for custom secondary indices



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13270) Add function hooks to deliver Elasticsearch as a Cassandra plugin

2017-02-25 Thread vincent royer (JIRA)
vincent royer created CASSANDRA-13270:
-

 Summary: Add function hooks to deliver Elasticsearch as a 
Cassandra plugin
 Key: CASSANDRA-13270
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13270
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: vincent royer
Priority: Minor
 Fix For: 3.0.12, 3.11.0


With these basic modifications (see the patch) and the following tickets, the 
Elassandra project (see https://github.com/strapdata/elassandra) could be an 
Elasticsearch plugin for Cassandra.
* CASSANDRA-12837 Add multi-threaded support to nodetool rebuild_index.
* CASSANDRA-13267 Add CQL functions.
* CASSANDRA-13268 Allow to create custom secondary index on static columns.
* CASSANDRA-13269 Snapshot support for custom secondary indices




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13269) Snapshot support for custom secondary indices

2017-02-25 Thread vincent royer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vincent royer updated CASSANDRA-13269:
--
Attachment: 0001-CASSANDRA-13269-custom-indices-snapshot.patch

> Snapshot support for custom secondary indices
> -
>
> Key: CASSANDRA-13269
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13269
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: vincent royer
>Priority: Trivial
>  Labels: features
> Fix For: 3.0.12, 3.11.0
>
> Attachments: 0001-CASSANDRA-13269-custom-indices-snapshot.patch
>
>
> Enhance the index API to support snapshot of custom secondary indices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13269) Snapshot support for custom secondary indices

2017-02-25 Thread vincent royer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vincent royer updated CASSANDRA-13269:
--
Labels: features  (was: )
Status: Patch Available  (was: Open)

Here is an implementation to snapshot custom secondary indices when 
snapshotting SSTables. With this feature, Elassandra is already able to make 
consistent snapshots of both SSTables and Elasticsearch lucene files.

> Snapshot support for custom secondary indices
> -
>
> Key: CASSANDRA-13269
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13269
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: vincent royer
>Priority: Trivial
>  Labels: features
> Fix For: 3.0.12, 3.11.0
>
>
> Enhance the index API to support snapshot of custom secondary indices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13269) Snapshot support for custom secondary indices

2017-02-25 Thread vincent royer (JIRA)
vincent royer created CASSANDRA-13269:
-

 Summary: Snapshot support for custom secondary indices
 Key: CASSANDRA-13269
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13269
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: vincent royer
Priority: Trivial
 Fix For: 3.0.12, 3.11.0


Enhance the index API to support snapshot of custom secondary indices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13268) Allow to create custom secondary index on static columns

2017-02-25 Thread vincent royer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vincent royer updated CASSANDRA-13268:
--
   Labels: features  (was: )
Reproduced In: 3.0.10
   Status: Patch Available  (was: Open)

This patch allow to create a custom secondary index on a static column with an 
option as follow :
CREATE TABLE test.t2 (
a int,
b text,
c text static,
PRIMARY KEY (a, b)
);
CREATE CUSTOM INDEX my_idx ON test.t2 (c) USING 'a.class' WITH OPTIONS = 
{'enforce': 'true'};

> Allow to create custom secondary index on static columns
> 
>
> Key: CASSANDRA-13268
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13268
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, CQL
>Reporter: vincent royer
>Priority: Trivial
>  Labels: features
> Fix For: 3.0.x
>
> Attachments: 0001-CASSANDRA-13268-custom-index-on-static-columns.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Custom secondary index implementations (like elassandra) could gain avantage 
> to index static columns, even if not searchable with CQL. Here is a proposal 
> to allow index creation on static columns.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13268) Allow to create custom secondary index on static columns

2017-02-25 Thread vincent royer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vincent royer updated CASSANDRA-13268:
--
Attachment: 0001-CASSANDRA-13268-custom-index-on-static-columns.patch

> Allow to create custom secondary index on static columns
> 
>
> Key: CASSANDRA-13268
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13268
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, CQL
>Reporter: vincent royer
>Priority: Trivial
>  Labels: features
> Fix For: 3.0.x
>
> Attachments: 0001-CASSANDRA-13268-custom-index-on-static-columns.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Custom secondary index implementations (like elassandra) could gain avantage 
> to index static columns, even if not searchable with CQL. Here is a proposal 
> to allow index creation on static columns.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13268) Allow to create custom secondary index on static columns

2017-02-25 Thread vincent royer (JIRA)
vincent royer created CASSANDRA-13268:
-

 Summary: Allow to create custom secondary index on static columns
 Key: CASSANDRA-13268
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13268
 Project: Cassandra
  Issue Type: Improvement
  Components: Core, CQL
Reporter: vincent royer
Priority: Trivial
 Fix For: 3.0.x


Custom secondary index implementations (like elassandra) could gain avantage to 
index static columns, even if not searchable with CQL. Here is a proposal to 
allow index creation on static columns.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13267) Add new CQL functions

2017-02-25 Thread vincent royer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vincent royer updated CASSANDRA-13267:
--
Attachment: 0001-CASSANDRA-13267-Add-CQL-functions.patch

> Add new CQL functions
> -
>
> Key: CASSANDRA-13267
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13267
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: vincent royer
>Priority: Trivial
>  Labels: features
> Fix For: 3.0.x
>
> Attachments: 0001-CASSANDRA-13267-Add-CQL-functions.patch
>
>
> Introduce 2 new CQL functions :
> -toString(x) converts a column to its string representation.
> -toJsonArray(x, y, z...) generates a JSON array of JSON string.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13267) Add new CQL functions

2017-02-25 Thread vincent royer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vincent royer updated CASSANDRA-13267:
--
   Labels: features  (was: )
Fix Version/s: (was: 3.11.x)
   3.0.x
Reproduced In: 3.0.11
   Status: Patch Available  (was: Open)

> Add new CQL functions
> -
>
> Key: CASSANDRA-13267
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13267
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: vincent royer
>Priority: Trivial
>  Labels: features
> Fix For: 3.0.x
>
>
> Introduce 2 new CQL functions :
> -toString(x) converts a column to its string representation.
> -toJsonArray(x, y, z...) generates a JSON array of JSON string.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13267) Add new CQL functions

2017-02-25 Thread vincent royer (JIRA)
vincent royer created CASSANDRA-13267:
-

 Summary: Add new CQL functions
 Key: CASSANDRA-13267
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13267
 Project: Cassandra
  Issue Type: Improvement
  Components: CQL
Reporter: vincent royer
Priority: Trivial
 Fix For: 3.11.x


Introduce 2 new CQL functions :
-toString(x) converts a column to its string representation.
-toJsonArray(x, y, z...) generates a JSON array of JSON string.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-12837) Add multi-threaded support to nodetool rebuild_index

2017-02-25 Thread vincent royer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vincent royer updated CASSANDRA-12837:
--
Attachment: 0001-CASSANDRA-12837-multi-threaded-rebuild_index.patch

Patch for Cassandra 3.0.11 (successfully tested on cassandra 2.2 and 3.0.x)

> Add multi-threaded support to nodetool rebuild_index
> 
>
> Key: CASSANDRA-12837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12837
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: vincent royer
>Priority: Minor
>  Labels: patch, performance
> Fix For: 3.0.12, 4.x
>
> Attachments: 0001-CASSANDRA-12837-multi-threaded-rebuild_index.patch, 
> CASSANDRA-12837-2.2.9.txt
>
>
> Add multi-thread nodetool rebuild_index to improve performances.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-12837) Add multi-threaded support to nodetool rebuild_index

2017-02-25 Thread vincent royer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vincent royer updated CASSANDRA-12837:
--
Labels: patch performance  (was: patch)

> Add multi-threaded support to nodetool rebuild_index
> 
>
> Key: CASSANDRA-12837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12837
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: vincent royer
>Priority: Minor
>  Labels: patch, performance
> Fix For: 3.0.12, 4.x
>
> Attachments: CASSANDRA-12837-2.2.9.txt
>
>
> Add multi-thread nodetool rebuild_index to improve performances.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-12837) Add multi-threaded support to nodetool rebuild_index

2017-02-25 Thread vincent royer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vincent royer updated CASSANDRA-12837:
--
Reproduced In: 3.0.11, 2.2.x  (was: 2.2.x)

> Add multi-threaded support to nodetool rebuild_index
> 
>
> Key: CASSANDRA-12837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12837
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: vincent royer
>Priority: Minor
>  Labels: patch
> Fix For: 3.0.12, 4.x
>
> Attachments: CASSANDRA-12837-2.2.9.txt
>
>
> Add multi-thread nodetool rebuild_index to improve performances.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-12837) Add multi-threaded support to nodetool rebuild_index

2017-02-25 Thread vincent royer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vincent royer updated CASSANDRA-12837:
--
Fix Version/s: 3.0.12

> Add multi-threaded support to nodetool rebuild_index
> 
>
> Key: CASSANDRA-12837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12837
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: vincent royer
>Priority: Minor
>  Labels: patch
> Fix For: 3.0.12, 4.x
>
> Attachments: CASSANDRA-12837-2.2.9.txt
>
>
> Add multi-thread nodetool rebuild_index to improve performances.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2017-02-25 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884366#comment-15884366
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

[~slebresne] Would you please review this 
[patch|https://github.com/cooldoger/cassandra/commit/bf6bc14a130dae64cb859e81ad54b21d5434d46a]?

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13266) Bulk loading sometimes is very slow?

2017-02-25 Thread liangsibin (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liangsibin updated CASSANDRA-13266:
---
Description: 
When I bulkload sstable created with CQLSSTableWriter, it's sometimes very 
slow.  
CQLSSTableWriter withBufferSizeInMB 32MB
use 2 nodes write SSTable and bulkload
1、Use CQLSSTableWriter create SSTable (60 threads)
2、When the directory over 10 rows,bulkload the directory (20 threads)
the normal bulkload speed is about 70M/s per node,and bulkload 141G SStables 
per node cost 90 minutes but sometimes is very slow,the same data cost 4 hours 
why?
here is the code bulkload sstable
{code:java}
public class JmxBulkLoader {

static final Logger LOGGER = LoggerFactory.getLogger(JmxBulkLoader.class);
private JMXConnector connector;
private StorageServiceMBean storageBean;
private Timer timer = new Timer();

public JmxBulkLoader(String host, int port) throws Exception {
connect(host, port);
}


private void connect(String host, int port) throws IOException, 
MalformedObjectNameException {
JMXServiceURL jmxUrl = new JMXServiceURL(

String.format("service:jmx:rmi:///jndi/rmi://%s:%d/jmxrmi", host, port));
Map env = new HashMap();
connector = JMXConnectorFactory.connect(jmxUrl, env);
MBeanServerConnection mbeanServerConn = 
connector.getMBeanServerConnection();
ObjectName name = new 
ObjectName("org.apache.cassandra.db:type=StorageService");
storageBean = JMX.newMBeanProxy(mbeanServerConn, name, 
StorageServiceMBean.class);
}

public void close() throws IOException {
connector.close();
}

public void bulkLoad(String path) {
LOGGER.info("begin load data to cassandra " + new 
Path(path).getName());
timer.start();
storageBean.bulkLoad(path);
timer.end();
LOGGER.info("bulk load took " + timer.getTimeTakenMillis() + 
"ms, path: " + new Path(path).getName());
}
}
{code}
bulkload thread
{code:java} 
public class BulkThread implements Runnable {

private String path;
private String jmxHost;
private int jmxPort;

public BulkThread(String path, String jmxHost, int jmxPort) {
super();
this.path = path;
this.jmxHost = jmxHost;
this.jmxPort = jmxPort;
}
@Override
public void run() {
JmxBulkLoader bulkLoader = null;
try {
bulkLoader = new JmxBulkLoader(jmxHost, jmxPort);
bulkLoader.bulkLoad(path);
} catch (Exception e) {
e.printStackTrace();
} finally {
if (bulkLoader != null)
try {
bulkLoader.close();
bulkLoader = null;
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
{code}

  was:
When I bulkload sstable created with CQLSSTableWriter, it's sometimes very 
slow.  
use 2 nodes write SSTable and bulkload
1、Use CQLSSTableWriter create SSTable (60 threads)
2、When the directory over 10 rows,bulkload the directory (20 threads)
the normal bulkload speed is about 70M/s per node,and bulkload 141G SStables 
per node cost 90 minutes but sometimes is very slow,the same data cost 4 hours 
why?
here is the code bulkload sstable
{code:java}
public class JmxBulkLoader {

static final Logger LOGGER = LoggerFactory.getLogger(JmxBulkLoader.class);
private JMXConnector connector;
private StorageServiceMBean storageBean;
private Timer timer = new Timer();

public JmxBulkLoader(String host, int port) throws Exception {
connect(host, port);
}


private void connect(String host, int port) throws IOException, 
MalformedObjectNameException {
JMXServiceURL jmxUrl = new JMXServiceURL(

String.format("service:jmx:rmi:///jndi/rmi://%s:%d/jmxrmi", host, port));
Map env = new HashMap();
connector = JMXConnectorFactory.connect(jmxUrl, env);
MBeanServerConnection mbeanServerConn = 
connector.getMBeanServerConnection();
ObjectName name = new 
ObjectName("org.apache.cassandra.db:type=StorageService");
storageBean = JMX.newMBeanProxy(mbeanServerConn, name, 
StorageServiceMBean.class);
}

public void 

[jira] [Updated] (CASSANDRA-13266) Bulk loading sometimes is very slow?

2017-02-25 Thread liangsibin (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liangsibin updated CASSANDRA-13266:
---
Description: 
When I bulkload sstable created with CQLSSTableWriter, it's sometimes very 
slow.  
use 2 nodes write SSTable and bulkload
1、Use CQLSSTableWriter create SSTable (60 threads)
2、When the directory over 10 rows,bulkload the directory (20 threads)
the normal bulkload speed is about 70M/s per node,and bulkload 141G SStables 
per node cost 90 minutes but sometimes is very slow,the same data cost 4 hours 
why?
here is the code bulkload sstable
{code:java}
public class JmxBulkLoader {

static final Logger LOGGER = LoggerFactory.getLogger(JmxBulkLoader.class);
private JMXConnector connector;
private StorageServiceMBean storageBean;
private Timer timer = new Timer();

public JmxBulkLoader(String host, int port) throws Exception {
connect(host, port);
}


private void connect(String host, int port) throws IOException, 
MalformedObjectNameException {
JMXServiceURL jmxUrl = new JMXServiceURL(

String.format("service:jmx:rmi:///jndi/rmi://%s:%d/jmxrmi", host, port));
Map env = new HashMap();
connector = JMXConnectorFactory.connect(jmxUrl, env);
MBeanServerConnection mbeanServerConn = 
connector.getMBeanServerConnection();
ObjectName name = new 
ObjectName("org.apache.cassandra.db:type=StorageService");
storageBean = JMX.newMBeanProxy(mbeanServerConn, name, 
StorageServiceMBean.class);
}

public void close() throws IOException {
connector.close();
}

public void bulkLoad(String path) {
LOGGER.info("begin load data to cassandra " + new 
Path(path).getName());
timer.start();
storageBean.bulkLoad(path);
timer.end();
LOGGER.info("bulk load took " + timer.getTimeTakenMillis() + 
"ms, path: " + new Path(path).getName());
}
}
{code}
bulkload thread
{code:java} 
public class BulkThread implements Runnable {

private String path;
private String jmxHost;
private int jmxPort;

public BulkThread(String path, String jmxHost, int jmxPort) {
super();
this.path = path;
this.jmxHost = jmxHost;
this.jmxPort = jmxPort;
}
@Override
public void run() {
JmxBulkLoader bulkLoader = null;
try {
bulkLoader = new JmxBulkLoader(jmxHost, jmxPort);
bulkLoader.bulkLoad(path);
} catch (Exception e) {
e.printStackTrace();
} finally {
if (bulkLoader != null)
try {
bulkLoader.close();
bulkLoader = null;
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
{code}

  was:
When I bulkload sstable created with CQLSSTableWriter, it's sometimes very 
slow.  
use 2 nodes write SSTable and bulkload
1、Use CQLSSTableWriter create SSTable (60 threads)
2、When the directory over 10 rows,bulkload the directory (20 threads)
the normal bulkload speed is about 70M/s per node,and bulkload 141G SStables 
cost 90 minutes but sometimes is very slow,the same data cost 4 hours why?
here is the code bulkload sstable
{code:java}
public class JmxBulkLoader {

static final Logger LOGGER = LoggerFactory.getLogger(JmxBulkLoader.class);
private JMXConnector connector;
private StorageServiceMBean storageBean;
private Timer timer = new Timer();

public JmxBulkLoader(String host, int port) throws Exception {
connect(host, port);
}


private void connect(String host, int port) throws IOException, 
MalformedObjectNameException {
JMXServiceURL jmxUrl = new JMXServiceURL(

String.format("service:jmx:rmi:///jndi/rmi://%s:%d/jmxrmi", host, port));
Map env = new HashMap();
connector = JMXConnectorFactory.connect(jmxUrl, env);
MBeanServerConnection mbeanServerConn = 
connector.getMBeanServerConnection();
ObjectName name = new 
ObjectName("org.apache.cassandra.db:type=StorageService");
storageBean = JMX.newMBeanProxy(mbeanServerConn, name, 
StorageServiceMBean.class);
}

public void close() throws IOException {

[jira] [Updated] (CASSANDRA-13266) Bulk loading sometimes is very slow?

2017-02-25 Thread liangsibin (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liangsibin updated CASSANDRA-13266:
---
Description: 
When I bulkload sstable created with CQLSSTableWriter, it's sometimes very 
slow.  
use 2 nodes write SSTable and bulkload
1、Use CQLSSTableWriter create SSTable (60 threads)
2、When the directory over 10 rows,bulkload the directory (20 threads)
the normal bulkload speed is about 70M/s per node,and bulkload 141G SStables 
cost 90 minutes but sometimes is very slow,the same data cost 4 hours why?
here is the code bulkload sstable
{code:java}
public class JmxBulkLoader {

static final Logger LOGGER = LoggerFactory.getLogger(JmxBulkLoader.class);
private JMXConnector connector;
private StorageServiceMBean storageBean;
private Timer timer = new Timer();

public JmxBulkLoader(String host, int port) throws Exception {
connect(host, port);
}


private void connect(String host, int port) throws IOException, 
MalformedObjectNameException {
JMXServiceURL jmxUrl = new JMXServiceURL(

String.format("service:jmx:rmi:///jndi/rmi://%s:%d/jmxrmi", host, port));
Map env = new HashMap();
connector = JMXConnectorFactory.connect(jmxUrl, env);
MBeanServerConnection mbeanServerConn = 
connector.getMBeanServerConnection();
ObjectName name = new 
ObjectName("org.apache.cassandra.db:type=StorageService");
storageBean = JMX.newMBeanProxy(mbeanServerConn, name, 
StorageServiceMBean.class);
}

public void close() throws IOException {
connector.close();
}

public void bulkLoad(String path) {
LOGGER.info("begin load data to cassandra " + new 
Path(path).getName());
timer.start();
storageBean.bulkLoad(path);
timer.end();
LOGGER.info("bulk load took " + timer.getTimeTakenMillis() + 
"ms, path: " + new Path(path).getName());
}
}
{code}
bulkload thread
{code:java} 
public class BulkThread implements Runnable {

private String path;
private String jmxHost;
private int jmxPort;

public BulkThread(String path, String jmxHost, int jmxPort) {
super();
this.path = path;
this.jmxHost = jmxHost;
this.jmxPort = jmxPort;
}
@Override
public void run() {
JmxBulkLoader bulkLoader = null;
try {
bulkLoader = new JmxBulkLoader(jmxHost, jmxPort);
bulkLoader.bulkLoad(path);
} catch (Exception e) {
e.printStackTrace();
} finally {
if (bulkLoader != null)
try {
bulkLoader.close();
bulkLoader = null;
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
{code}

  was:
When I bulkload sstable created with CQLSSTableWriter, it's sometimes very 
slow.  
use 2 nodes write SSTable and bulkload
1、Use CQLSSTableWriter create SSTable (60 threads)
2、When the directory over 10 rows,bulkload the directory (20 threads)
the normal bulkload speed is about 70M/s per node,and bulkload 141G SStables 
cost 90 minutes but sometimes is very slow,the same data cost 4 hours why?
here is the code bulkload sstable

|public class JmxBulkLoader {

static final Logger LOGGER = LoggerFactory.getLogger(JmxBulkLoader.class);
private JMXConnector connector;
private StorageServiceMBean storageBean;
private Timer timer = new Timer();

public JmxBulkLoader(String host, int port) throws Exception {
connect(host, port);
}


private void connect(String host, int port) throws IOException, 
MalformedObjectNameException {
JMXServiceURL jmxUrl = new JMXServiceURL(

String.format("service:jmx:rmi:///jndi/rmi://%s:%d/jmxrmi", host, port));
Map env = new HashMap();
connector = JMXConnectorFactory.connect(jmxUrl, env);
MBeanServerConnection mbeanServerConn = 
connector.getMBeanServerConnection();
ObjectName name = new 
ObjectName("org.apache.cassandra.db:type=StorageService");
storageBean = JMX.newMBeanProxy(mbeanServerConn, name, 
StorageServiceMBean.class);
}

public void close() throws IOException {
connector.close();
}

 

[jira] [Updated] (CASSANDRA-13266) Bulk loading sometimes is very slow?

2017-02-25 Thread liangsibin (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liangsibin updated CASSANDRA-13266:
---
Description: 
When I bulkload sstable created with CQLSSTableWriter, it's sometimes very 
slow.  
use 2 nodes write SSTable and bulkload
1、Use CQLSSTableWriter create SSTable (60 threads)
2、When the directory over 10 rows,bulkload the directory (20 threads)
the normal bulkload speed is about 70M/s per node,and bulkload 141G SStables 
cost 90 minutes but sometimes is very slow,the same data cost 4 hours why?
here is the code bulkload sstable

|public class JmxBulkLoader {

static final Logger LOGGER = LoggerFactory.getLogger(JmxBulkLoader.class);
private JMXConnector connector;
private StorageServiceMBean storageBean;
private Timer timer = new Timer();

public JmxBulkLoader(String host, int port) throws Exception {
connect(host, port);
}


private void connect(String host, int port) throws IOException, 
MalformedObjectNameException {
JMXServiceURL jmxUrl = new JMXServiceURL(

String.format("service:jmx:rmi:///jndi/rmi://%s:%d/jmxrmi", host, port));
Map env = new HashMap();
connector = JMXConnectorFactory.connect(jmxUrl, env);
MBeanServerConnection mbeanServerConn = 
connector.getMBeanServerConnection();
ObjectName name = new 
ObjectName("org.apache.cassandra.db:type=StorageService");
storageBean = JMX.newMBeanProxy(mbeanServerConn, name, 
StorageServiceMBean.class);
}

public void close() throws IOException {
connector.close();
}

public void bulkLoad(String path) {
LOGGER.info("begin load data to cassandra " + new 
Path(path).getName());
timer.start();
storageBean.bulkLoad(path);
timer.end();
LOGGER.info("bulk load took " + timer.getTimeTakenMillis() + 
"ms, path: " + new Path(path).getName());
}
}
bulkload thread 
|public class BulkThread implements Runnable {

private String path;
private String jmxHost;
private int jmxPort;

public BulkThread(String path, String jmxHost, int jmxPort) {
super();
this.path = path;
this.jmxHost = jmxHost;
this.jmxPort = jmxPort;
}
@Override
public void run() {
JmxBulkLoader bulkLoader = null;
try {
bulkLoader = new JmxBulkLoader(jmxHost, jmxPort);
bulkLoader.bulkLoad(path);
} catch (Exception e) {
e.printStackTrace();
} finally {
if (bulkLoader != null)
try {
bulkLoader.close();
bulkLoader = null;
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

  was:
When I bulkload sstable created with CQLSSTableWriter, it's sometimes very 
slow.  
use 2 nodes write SSTable and bulkload
1、Use CQLSSTableWriter create SSTable (60 threads)
2、When the directory over 10 rows,bulkload the directory (20 threads)
the normal bulkload speed is about 70M/s per node,and bulkload 141G SStables 
cost 90 minutes but sometimes is very slow,the same data cost 4 hours why?
here is the code bulkload sstable

public class JmxBulkLoader {

static final Logger LOGGER = LoggerFactory.getLogger(JmxBulkLoader.class);
private JMXConnector connector;
private StorageServiceMBean storageBean;
private Timer timer = new Timer();

public JmxBulkLoader(String host, int port) throws Exception {
connect(host, port);
}


private void connect(String host, int port) throws IOException, 
MalformedObjectNameException {
JMXServiceURL jmxUrl = new JMXServiceURL(

String.format("service:jmx:rmi:///jndi/rmi://%s:%d/jmxrmi", host, port));
Map env = new HashMap();
connector = JMXConnectorFactory.connect(jmxUrl, env);
MBeanServerConnection mbeanServerConn = 
connector.getMBeanServerConnection();
ObjectName name = new 
ObjectName("org.apache.cassandra.db:type=StorageService");
storageBean = JMX.newMBeanProxy(mbeanServerConn, name, 
StorageServiceMBean.class);
}

public void close() throws IOException {
connector.close();
}

public void bulkLoad(String 

[jira] [Commented] (CASSANDRA-13259) Use platform specific X.509 default algorithm

2017-02-25 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884238#comment-15884238
 ] 

Robert Stupp commented on CASSANDRA-13259:
--

One minor nit: Can you add an entry to NEWS.txt about this change?

> Use platform specific X.509 default algorithm
> -
>
> Key: CASSANDRA-13259
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13259
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Minor
> Fix For: 4.x
>
>
> We should replace the hardcoded "SunX509" default algorithm and use the JRE 
> default instead. This implementation will currently not work on less popular 
> platforms (e.g. IBM) and won't get any further updates.
> See also:
> https://bugs.openjdk.java.net/browse/JDK-8169745



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13266) Bulk loading sometimes is very slow?

2017-02-25 Thread liangsibin (JIRA)
liangsibin created CASSANDRA-13266:
--

 Summary: Bulk loading sometimes is very slow?
 Key: CASSANDRA-13266
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13266
 Project: Cassandra
  Issue Type: Improvement
Reporter: liangsibin


When I bulkload sstable created with CQLSSTableWriter, it's sometimes very 
slow.  
use 2 nodes write SSTable and bulkload
1、Use CQLSSTableWriter create SSTable (60 threads)
2、When the directory over 10 rows,bulkload the directory (20 threads)
the normal bulkload speed is about 70M/s per node,and bulkload 141G SStables 
cost 90 minutes but sometimes is very slow,the same data cost 4 hours why?
here is the code bulkload sstable

public class JmxBulkLoader {

static final Logger LOGGER = LoggerFactory.getLogger(JmxBulkLoader.class);
private JMXConnector connector;
private StorageServiceMBean storageBean;
private Timer timer = new Timer();

public JmxBulkLoader(String host, int port) throws Exception {
connect(host, port);
}


private void connect(String host, int port) throws IOException, 
MalformedObjectNameException {
JMXServiceURL jmxUrl = new JMXServiceURL(

String.format("service:jmx:rmi:///jndi/rmi://%s:%d/jmxrmi", host, port));
Map env = new HashMap();
connector = JMXConnectorFactory.connect(jmxUrl, env);
MBeanServerConnection mbeanServerConn = 
connector.getMBeanServerConnection();
ObjectName name = new 
ObjectName("org.apache.cassandra.db:type=StorageService");
storageBean = JMX.newMBeanProxy(mbeanServerConn, name, 
StorageServiceMBean.class);
}

public void close() throws IOException {
connector.close();
}

public void bulkLoad(String path) {
LOGGER.info("begin load data to cassandra " + new 
Path(path).getName());
timer.start();
storageBean.bulkLoad(path);
timer.end();
LOGGER.info("bulk load took " + timer.getTimeTakenMillis() + 
"ms, path: " + new Path(path).getName());
}
}

bulkload thread 
public class BulkThread implements Runnable {

private String path;
private String jmxHost;
private int jmxPort;

public BulkThread(String path, String jmxHost, int jmxPort) {
super();
this.path = path;
this.jmxHost = jmxHost;
this.jmxPort = jmxPort;
}
@Override
public void run() {
JmxBulkLoader bulkLoader = null;
try {
bulkLoader = new JmxBulkLoader(jmxHost, jmxPort);
bulkLoader.bulkLoad(path);
} catch (Exception e) {
e.printStackTrace();
} finally {
if (bulkLoader != null)
try {
bulkLoader.close();
bulkLoader = null;
} catch (IOException e) {
e.printStackTrace();
}
}
}
}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13259) Use platform specific X.509 default algorithm

2017-02-25 Thread Jason Brown (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-13259:

Status: Ready to Commit  (was: Patch Available)

> Use platform specific X.509 default algorithm
> -
>
> Key: CASSANDRA-13259
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13259
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Minor
> Fix For: 4.x
>
>
> We should replace the hardcoded "SunX509" default algorithm and use the JRE 
> default instead. This implementation will currently not work on less popular 
> platforms (e.g. IBM) and won't get any further updates.
> See also:
> https://bugs.openjdk.java.net/browse/JDK-8169745



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13259) Use platform specific X.509 default algorithm

2017-02-25 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884237#comment-15884237
 ] 

Jason Brown commented on CASSANDRA-13259:
-

+1

> Use platform specific X.509 default algorithm
> -
>
> Key: CASSANDRA-13259
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13259
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Minor
> Fix For: 4.x
>
>
> We should replace the hardcoded "SunX509" default algorithm and use the JRE 
> default instead. This implementation will currently not work on less popular 
> platforms (e.g. IBM) and won't get any further updates.
> See also:
> https://bugs.openjdk.java.net/browse/JDK-8169745



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12653) In-flight shadow round requests

2017-02-25 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884235#comment-15884235
 ] 

Jason Brown commented on CASSANDRA-12653:
-

[~spod] I'm on board with {{firstSynSendAt}} being a timestamp, I'm just 
questioning what you compare it against. Instead of this:

{code}
long ts = epStateMap.values().iterator().next().getUpdateTimestamp();
if ((ts - Gossiper.instance.firstSynSendAt) < 0 || 
Gossiper.instance.firstSynSendAt == 0)
{code}

I'm suggesting this:
{code}
if ((System.nanoTime() - Gossiper.instance.firstSynSendAt) < 0 || 
Gossiper.instance.firstSynSendAt == 0)
{code}

It amounts to the same thing (a check against {{System.nanoTime()}}), but it's 
more obvious where the value comes from.

[~jkni] wrt to {{synchronized}}, on one hand agree with you, but then that 
makes the argument for making many more of the methods on {{Gossiper}} 
synchronized. If we're going to make this method different from the rest (which 
is fine with me), can we add in a relevant, detailed comment why it is?

> In-flight shadow round requests
> ---
>
> Key: CASSANDRA-12653
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12653
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Minor
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> Bootstrapping or replacing a node in the cluster requires to gather and check 
> some host IDs or tokens by doing a gossip "shadow round" once before joining 
> the cluster. This is done by sending a gossip SYN to all seeds until we 
> receive a response with the cluster state, from where we can move on in the 
> bootstrap process. Receiving a response will call the shadow round done and 
> calls {{Gossiper.resetEndpointStateMap}} for cleaning up the received state 
> again.
> The issue here is that at this point there might be other in-flight requests 
> and it's very likely that shadow round responses from other seeds will be 
> received afterwards, while the current state of the bootstrap process doesn't 
> expect this to happen (e.g. gossiper may or may not be enabled). 
> One side effect will be that MigrationTasks are spawned for each shadow round 
> reply except the first. Tasks might or might not execute based on whether at 
> execution time {{Gossiper.resetEndpointStateMap}} had been called, which 
> effects the outcome of {{FailureDetector.instance.isAlive(endpoint))}} at 
> start of the task. You'll see error log messages such as follows when this 
> happend:
> {noformat}
> INFO  [SharedPool-Worker-1] 2016-09-08 08:36:39,255 Gossiper.java:993 - 
> InetAddress /xx.xx.xx.xx is now UP
> ERROR [MigrationStage:1]2016-09-08 08:36:39,255 FailureDetector.java:223 
> - unknown endpoint /xx.xx.xx.xx
> {noformat}
> Although is isn't pretty, I currently don't see any serious harm from this, 
> but it would be good to get a second opinion (feel free to close as "wont 
> fix").
> /cc [~Stefania] [~thobbs]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)