Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2286#discussion_r187242849 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/iterator/CarbonOutputIteratorWrapper.java --- @@ -51,10 +57,16 @@ public void write(Object[] row) throws InterruptedException { // already might be closed forcefully return; } - if (!loadBatch.addRow(row)) { - loadBatch.readyRead(); - queue.put(loadBatch); - loadBatch = new RowBatch(batchSize); + // synchronization block is added for multi threaded scenarios where multiple instances of + // writer thread are trying to add a row to the RowBatch. In those cases addition to given + // batch size cannot be ensured and it can lead to ArrayIndexOutOfBound Exception or data + // loss/mismatch issues + synchronized (lock) { --- End diff -- Even though current writer interface is for single thread we cant block its usage for multi-threaded scenario i.e write method is called by multiple threads using the same writer instance. 1. If there is a single writer instance and only one thread calls the write interface then there is no impact on the performance as the call will come one by one from the same thread and lock will be acquired only by that thread. 2. If there is single writer and multiple threads are calling the write interface using the same writer instance then locking is required because add a row to RowBatch is not synchronized and can lead to ArrayIndexoutOfBound or data loss/mismatch issues.
---