Hi On Sun, Nov 8, 2009 at 3:56 PM, Jonathan Ellis <[email protected]> wrote:
> - You’ll easily double performance by setting the log level from DEBUG > to INFO (unclear if you actually did this, so mentioning it for > completeness) > No problem I've check all is on INFO > - 0.4.1 has bad default GC options. the defaults will be fixed for > 0.4.2 and 0.5, but it’s easy to tweak for 0.4.1: > > http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/200910.mbox > Sorry I can't find the post talking about that I can't open this link on mac os > - it doesn't look like you're doing parallel inserts. you should have > at least a few dozen to a few hundred threads if you want to measure > throughput rather than just latency. run the client on a machine that > is not running cassandra, since it can also use a decent amount of > CPU. > You mean by parallel to write a code running the insert into thread instead of one by one ? If it's the case is the Thrift API are thread safe ?. Ho do you manage the opening and the close of the connection ? like single thread open one and closed at the end. > - using batch_insert will be much faster than multiple single-column > inserts to the same row > > I've made modification like this : public void insertChannelShow(String showId, String channelId, String airDate, String duration, String title, String parentShowId, String genre, String price, String subtitle) throws Exception { Calendar calendar = Calendar.getInstance(); dateFormat.setCalendar(calendar); Date air = dateFormat.parse(airDate); calendar.setTime(air); String key = String.valueOf(calendar.getTimeInMillis()) + ":" + showId + ":" + channelId; long timestamp = System.currentTimeMillis(); Map<String, List<ColumnOrSuperColumn>> insertDataMap = new HashMap<String, List<ColumnOrSuperColumn>>(); List<ColumnOrSuperColumn> rowData = new ArrayList<ColumnOrSuperColumn>(); rowData.add(new ColumnOrSuperColumn(new Column(("duration").getBytes("UTF-8"), duration.getBytes("UTF-8"), timestamp), null)); rowData.add(new ColumnOrSuperColumn(new Column(("title").getBytes("UTF-8"), title.getBytes("UTF-8"), timestamp), null)); rowData.add(new ColumnOrSuperColumn(new Column(("parentShowId").getBytes("UTF-8"), parentShowId.getBytes("UTF-8"), timestamp), null)); rowData.add(new ColumnOrSuperColumn(new Column(("genre").getBytes("UTF-8"), genre.getBytes("UTF-8"), timestamp), null)); rowData.add(new ColumnOrSuperColumn(new Column(("price").getBytes("UTF-8"), price.getBytes("UTF-8"), timestamp), null)); rowData.add(new ColumnOrSuperColumn(new Column(("subtitle").getBytes("UTF-8"), subtitle.getBytes("UTF-8"), timestamp), null)); insertDataMap.put("channelShow", rowData); cassandraClient.batch_insert("Keyspace1", key, insertDataMap, ConsistencyLevel.ONE); insertDataMap.clear(); rowData.clear(); insertDataMap = null; rowData = null; } Is it what you think about? Anyway I've opened a new small instance in amazon to run the insert not one running cassandra and give one of the cassandra server ip. It's not improve nothing. The client machine is 1% CPU the server machines are 1% CPU. The problem come when the data is distributed between the 2 cassandra servers because all the time the data go to commitlog of the first server all is ok ~2000 rows/second. But when the data goes to the second server it's falling very sharply ~200 rows /second. I've read that I can check latency with JMX. it's ok but I can't succed to connect JMX agent on amazon the params are OK but nothing help the jconsole on my side refuse to connect. Is there something else I can check ? Thanks
