[I] External compactions fail after a bulk import on a Bloom filter-enabled table [accumulo]

via GitHub Wed, 30 Apr 2025 06:29:54 -0700


DarwinKatanamp opened a new issue, #5517:
URL: https://github.com/apache/accumulo/issues/5517


   **Describe the bug**
   When doing a bulk import on tables that have a Bloom filter enabled, the 
external compactions fail with the error:
   
   ```
   compactor_q1 org.apache.accumulo.compactor.Compactor 449     ERROR   
Compactor thread was interrupted waiting for compaction to start, cancelling 
job        
   java.lang.UnsupportedOperationException
        at 
org.apache.accumulo.core.file.BloomFilterLayer$Reader.estimateOverlappingEntries(BloomFilterLayer.java:434)
        at 
org.apache.accumulo.compactor.Compactor.estimateOverlappingEntries(Compactor.java:635)
        at 
org.apache.accumulo.compactor.Compactor$2.lambda$initialize$0(Compactor.java:546)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
        at 
org.apache.accumulo.compactor.Compactor$2.initialize(Compactor.java:540)
        at org.apache.accumulo.compactor.Compactor.run(Compactor.java:751)
        at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
        at java.base/java.lang.Thread.run(Thread.java:1583)
   ```
   
   **Versions (OS, Maven, Java, and others, as appropriate):**
    - Affected version(s) of this project: 2.1.3
   
   **To Reproduce**
   Steps to reproduce the behavior (or a link to an example repository that 
reproduces the problem):
   Start a local fluo-uno cluster with the default external compactors enabled 
as defined in fluo-uno/install/accumulo-2.1.3/conf/cluster.yaml
   branch: main (e8f3ba9), accumulo version 2.1.3
   
   Generate local bulk import files with 
accumulo-examples/src/main/java/org/apache/accumulo/examples/mapreduce/bulk/BulkIngestExample.java
   I disabled the client.tableOperations().importDirectory and performed the 
bulk import in the Accumulo shell.
   I changed the default 1k rows to 2M rows.
   branch: 2.1 (9d400cd)
   
   Then copy to HDFS
   ```
   hadoop fs -mkdir -p /tmp/bulkWork
   hadoop fs -copyFromLocal /.../accumulo-examples/tmp/bulkWork/ /tmp/bulkWork
   ```
   
   In the Accumulo shell:
   ```createtable test1```
   
   Enable bloom filter
   ```config -t test1 -s table.bloom.enabled=true```
   This config is very likely not necessary (but I figured it'd help triggering 
compactions)
   ```config -t test1 -s table.split.threshold=100K```
   
   Configure external compactions in the shell:
   ```
   config -s 
tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
   config -s 
'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"large","type":"external","queue":"q1"}]'
   config -t test1 -s table.compaction.dispatcher.opts.service=cs1
   ```
   
   Do the bulk load
   ```importdirectory -t test1 /tmp/bulkWork/bulkWork/files true```
   
   Start compaction in the shell
   ```compact -t test1 -w```
   
   Resulting in the specified errors in the Monitor.
   
   **Expected behavior**
   No errors when externally compacting bulk-loaded bloom filter-enabled tables.
   
   **Additional context**
   Note unrelated to the problem:
   I had to disable 
   ```              <arg>-Xlint:all</arg>```
   in the root pom.xml for the project to compile (with a clean clone, Java 21).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] External compactions fail after a bulk import on a Bloom filter-enabled table [accumulo]

Reply via email to