[ 
https://issues.apache.org/jira/browse/CASSANDRA-6736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909439#comment-13909439
 ] 

Bill Mitchell commented on CASSANDRA-6736:
------------------------------------------

{quote}Is this reproducible?{quote}

In a statistical sense, yes, I have encountered this failure repeatedly when 
trying to insert 100 million rows into two tables, one with a single wide row, 
and the other with narrow rows.  The failure does not happen at exactly the 
same point, which suggests it is a timing problem.  The test does use randomly 
generated data, so one cannot completely rule out a data dependency.  There are 
also reads intermixed with the inserts, as for each segment of 10,000 rows, it 
checks for duplicate values and removes them before inserting the new, unique 
values.   

Test history of the 100 million insert test:
2014-02-04 passed
2014-02-07 passed
2014-02-10 failed after 16,707,724 rows (changing the second table to have a 
few wide rows)
2014-02-10 failed after 6,215,471 rows
2014-02-12 failed after 21,038,110 rows (after installing 2.0.5)
2014-02-12 failed after 63,397,406 rows
2014-02-14 failed after 33,974,034  rows
2014-02-16 passed, using unlogged batch instead of logged batch
2014-02-18 failed after 43,151,232 rows, using logged batch
2014-02-20 failed after 54,263,560 rows
2014-02-21 passed, logged batch but with only 1,000 rows inserted in each table 
instead of 10,000

The failures were observed on 2.0.3, 2.0.4, and 2.0.5, so they are not 
restricted to the most recent build.  I don't have a hypothesis for why the 
test passed on 02-04 and 02-07.  

I tried the reduced batch size of 1000 pairs of rows under the hypothesis that 
the failure has something to do with the large batch size causing a large 
commit log that needs to be compacted, and cannot just be marked as complete.  
Of course, one success does not necessarily mean that the problem cannot happen 
with the smaller batch size; it may just change the timing such that one is 
less likely to hit the failure.  

> Windows7 AccessDeniedException on commit log 
> ---------------------------------------------
>
>                 Key: CASSANDRA-6736
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6736
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Windows 7, quad core, 8GB RAM, single Cassandra node, 
> Cassandra 2.0.5 with leakdetect patch from CASSANDRA-6283
>            Reporter: Bill Mitchell
>            Assignee: Joshua McKenzie
>         Attachments: 2014-02-18-22-16.log
>
>
> Similar to the data file deletion of CASSANDRA-6283, under heavy load with 
> logged batches, I am seeing a problem where the Commit log cannot be deleted:
>  ERROR [COMMIT-LOG-ALLOCATOR] 2014-02-18 22:15:58,252 CassandraDaemon.java 
> (line 192) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
>  FSWriteError in C:\Program Files\DataStax 
> Community\data\commitlog\CommitLog-3-1392761510706.log
>       at 
> org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:120)
>       at 
> org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:150)
>       at 
> org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:217)
>       at 
> org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
>       at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>       at java.lang.Thread.run(Unknown Source)
> Caused by: java.nio.file.AccessDeniedException: C:\Program Files\DataStax 
> Community\data\commitlog\CommitLog-3-1392761510706.log
>       at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
>       at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
>       at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
>       at sun.nio.fs.WindowsFileSystemProvider.implDelete(Unknown Source)
>       at sun.nio.fs.AbstractFileSystemProvider.delete(Unknown Source)
>       at java.nio.file.Files.delete(Unknown Source)
>       at 
> org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:116)
>       ... 5 more
> (Attached in 2014-02-18-22-16.log is a larger excerpt from the cassandra.log.)
> In this particular case, I was trying to do 100 million inserts into two 
> tables in parallel, one with a single wide row and one with narrow rows, and 
> the error appeared after inserting 43,151,232 rows.  So it does take a while 
> to trip over this timing issue.  
> It may be aggravated by the size of the batches. This test was writing 10,000 
> rows to each table in a batch.  
> When I try switching the same test from using a logged batch to an unlogged 
> batch, and no such failure appears. So the issue could be related to the use 
> of large, logged batches, or it could be that unlogged batches just change 
> the probability of failure.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to