[ 
https://issues.apache.org/jira/browse/CASSANDRA-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Fenderbosch updated CASSANDRA-10637:
-----------------------------------------
    Description: 
We are writing a service to migrate data from various RDMBS tables in to 
Cassandra. We write out a CSV from the source system, use CQLSSTableWriter to 
write sstables to disk, then call sstableloader to stream to the Cassandra 
cluster.

Right now, we either have to:

* return a CSV location from one Java process to a wrapper script which then 
kicks off sstableloader
* or call sstableloader via Runtime.getRuntime().exec
* or call BulkLoader.main from within our Java code, using a custom 
SecurityManager to trap the System.exit calls
* or subclass BulkLoader putting the subclass in the org.apache.cassandra.tools 
package in order to access the package scoped inner classes

None of these solutions are ideal. Ideally, we should be able to use the 
functionality of BulkLoader.main directly. I've extracted LoaderOptions to a 
top level class that uses the builder pattern so that it can be used as part of 
a Java migration service directly.

Creating the builder can now be performed with a fluent builder interface:

LoaderOptions options = LoaderOptions.builder(). //
                connectionsPerHost(2). //
                directory(directory). //
                hosts(hosts). //
                build();

Or used to parse command line arguments:

    LoaderOptions options = LoaderOptions.builder().parseArgs(args).build();

A new load method takes a LoaderOptions parameter and throws BulkLoadException 
instead of System.exit(1).

Fork on github can be found here:

https://github.com/efenderbosch/cassandra

  was:
We are writing a service to migrate data from various RDMBS tables in to 
Cassandra. We write out a CSV from the source system, use CQLSSTableWriter to 
write sstables to disk, then call sstableloader to stream to the Cassandra 
cluster.

Right now, we either have to:

* return a CSV location from one Java process to a wrapper script which then 
kicks off sstableloader
* or call sstableloader via Runtime.getRuntime().exec
* or call BulkLoader.main from within our Java code, using a custom 
SecurityManager to trap the System.exit calls
* or subclass BulkLoader putting the subclass in the org.apache.cassandra.tools 
package in order to access the package scoped inner classes

None of these solutions are ideal. Ideally, we should be able to use the 
functionality of BulkLoader.main directly. I've extracted LoaderOptions to a 
top level class that uses the builder pattern so that it can be used as part of 
a Java migration service directly.

Creating the builder can now be performed with a fluent builder interface:

```java
LoaderOptions options = LoaderOptions.builder(). //
                connectionsPerHost(2). //
                directory(directory). //
                hosts(hosts). //
                build();
```

Or used to parse command line arguments:

```java
LoaderOptions options = LoaderOptions.builder().parseArgs(args).build();
```

A new load method takes a ``LoaderOptions`` parameter and throws 
``BulkLoadException`` instead of ```System.exit(1)```.

Fork on github can be found here:

https://github.com/efenderbosch/cassandra


> Extract LoaderOptions and refactor BulkLoader to be able to be used from 
> within existing Java code instead of just through main()
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10637
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10637
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Eric Fenderbosch
>            Priority: Minor
>             Fix For: 3.x
>
>
> We are writing a service to migrate data from various RDMBS tables in to 
> Cassandra. We write out a CSV from the source system, use CQLSSTableWriter to 
> write sstables to disk, then call sstableloader to stream to the Cassandra 
> cluster.
> Right now, we either have to:
> * return a CSV location from one Java process to a wrapper script which then 
> kicks off sstableloader
> * or call sstableloader via Runtime.getRuntime().exec
> * or call BulkLoader.main from within our Java code, using a custom 
> SecurityManager to trap the System.exit calls
> * or subclass BulkLoader putting the subclass in the 
> org.apache.cassandra.tools package in order to access the package scoped 
> inner classes
> None of these solutions are ideal. Ideally, we should be able to use the 
> functionality of BulkLoader.main directly. I've extracted LoaderOptions to a 
> top level class that uses the builder pattern so that it can be used as part 
> of a Java migration service directly.
> Creating the builder can now be performed with a fluent builder interface:
> LoaderOptions options = LoaderOptions.builder(). //
>                 connectionsPerHost(2). //
>                 directory(directory). //
>                 hosts(hosts). //
>                 build();
> Or used to parse command line arguments:
>     LoaderOptions options = LoaderOptions.builder().parseArgs(args).build();
> A new load method takes a LoaderOptions parameter and throws 
> BulkLoadException instead of System.exit(1).
> Fork on github can be found here:
> https://github.com/efenderbosch/cassandra



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to