Hi Tao,

> I am not sure what do you mean by ulimit issues

When so many small SST files are created, I run into limits on maximum open
files (ulimit -n).

I dug into RocksDB's (plethora of) options today and identified the option
that causes the 2.3 MB sizes: target_file_size_base
<https://github.com/facebook/rocksdb/blob/167fb919a55e8dc5d12d4debe7965208029e3505/include/rocksdb/options.h#L396>

That sets the target file size for compaction. Since that's not available
as a configuration from Samza, I had to create another
RocksDbKeyValueStorageEngineFactory and set the required options on the
RocksDB handle directly like this:

class RocksDbBulkKeyValueStorageEngineFactory [K, V] extends
BaseKeyValueStorageEngineFactory[K, V]{
  /**
   * A KeyValueStore instance optimized for bulk write and read use case
   */
  override def getKVStore(storeName: String,
                          storeDir: File,
                          registry: MetricsRegistry,
                          changeLogSystemStreamPartition:
SystemStreamPartition,
                          containerContext: SamzaContainerContext):
KeyValueStore[Array[Byte], Array[Byte]] = {
    val storageConfig = containerContext.config.subset("stores." +
storeName + ".", true)
    val rocksDbMetrics = new KeyValueStoreMetrics(storeName, registry)
    val rocksDbOptions = RocksDbKeyValueStore.options(storageConfig,
containerContext)
    val rocksDbWriteOptions = new WriteOptions().setDisableWAL(true)



*    rocksDbOptions.setTargetFileSizeBase(Integer.MAX_VALUE)
rocksDbOptions.setMaxBytesForLevelBase(Integer.MAX_VALUE)
rocksDbOptions.setSourceCompactionFactor(Integer.MAX_VALUE)*
*    rocksDbOptions.setLevelZeroSlowdownWritesTrigger(-1) // // no slowdown
at-all*

    val rocksDb = new RocksDbKeyValueStore(storeDir, rocksDbOptions,
rocksDbWriteOptions, rocksDbMetrics)
    rocksDb
  }
}

This produced large SST files that's great for bulk writes and reads during
joins.

Those are useful configurations, and should probably be exposed via Samza's
RocksDB configuration.

Thanks,

KN.


On Thu, Dec 17, 2015 at 2:53 AM, Tao Feng <fengta...@gmail.com> wrote:

> Hi Kishore,
>
> I am not sure what do you mean by ulimit issues, could you help to explain
> a little bit?
>
> And I am not sure if user could control the size of SST files as each SST
> file corresponds with one sorted run. My understanding is that the SST file
> size could depends on how many memtables get flushed. In your case, if only
> memtable is flush(64MB), the raw SST file size will be (64MB +index file
> size). The index file is used to locate data in get time. But since Samza
> by default will apply rocksdb compression(snappy), the actual SST file size
> would be (64M+index file)* snappy compress ratio.
>
> But rocksdb compaction could also create SST files as well. I just wrote a
> simple benchmark to mimic what you describes.
> I also observe lots of 2~3MB files are created. I am not very familiar with
> the rocksdb compaction process. But if you take a look at your Samza
> rocksdb log in "state" directory, you could find that the
> "table_file_creation" event which corresponds with SST file creation. For
> those small SST file creation in my case, it is triggered by the
> compaction. This may be the reason why you see many small SST files.
>
> HTH,
> -Tao
>
> On Wed, Dec 16, 2015 at 5:06 AM, Kishore N C <kishor...@gmail.com> wrote:
>
> > Hi,
> >
> > During a catch-up job that might require reprocessing of 100s of millions
> > of records, I wanted to tweak RocksDB configuration to ensure that it's
> > optimized for bulk writes. According to the documentation here
> > <
> >
> https://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html#task-opts
> > >,
> > setting stores.store-name.container.write.buffer.size.bytes would set the
> > size of the memtable, and also "determines the size of RocksDB's segment
> > files". For a job, I went ahead and set this property to 268435456
> (256MB),
> > and verified that the configuration was correctly picked-up and displayed
> > in the task log. However, the task still ended up creating hundreds of
> *2.3
> > MB* SST files, eventually leading ulimit issues. There were 4 tasks
> running
> > in each container, so I would have expected SST file sizes of 64 MB, but
> > that was not to be.
> >
> > Is my understanding of this configuration wrong? How do I control the
> size
> > of the SST files produced by RocksDB?
> >
> > Thanks,
> >
> > KN.
> >
>



-- 
It is our choices that show what we truly are,
far more than our abilities.

Reply via email to