Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2286#discussion_r187242849
  
    --- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/iterator/CarbonOutputIteratorWrapper.java
 ---
    @@ -51,10 +57,16 @@ public void write(Object[] row) throws 
InterruptedException {
           // already might be closed forcefully
           return;
         }
    -    if (!loadBatch.addRow(row)) {
    -      loadBatch.readyRead();
    -      queue.put(loadBatch);
    -      loadBatch = new RowBatch(batchSize);
    +    // synchronization block is added for multi threaded scenarios where 
multiple instances of
    +    // writer thread are trying to add a row to the RowBatch. In those 
cases addition to given
    +    // batch size cannot be ensured and it can lead to 
ArrayIndexOutOfBound Exception or data
    +    // loss/mismatch issues
    +    synchronized (lock) {
    --- End diff --
    
    Even though current writer interface is for single thread we cant block its 
usage for multi-threaded scenario i.e write method is called by multiple 
threads using the same writer instance.
    1. If there is a single writer instance and only one thread calls the write 
interface then there is no impact on the performance as the call will come one 
by one from the same thread and lock will be acquired only by that thread.
    2. If there is single writer and multiple threads are calling the write 
interface using the same writer instance then locking is required because add a 
row to RowBatch is not synchronized and can lead to ArrayIndexoutOfBound or 
data loss/mismatch issues.


---

Reply via email to