Problem scaling Extractors with data volume

Gurjinder Singh Rathore Thu, 28 Sep 2017 17:54:05 -0700

Hi,

I'm using gobblin (embedded) in my project to transfer loads of over 10
million+ rows at a time (single table). But when I run the load, gobbling
starts giving me errors since looks like the number of Extractor instances
grows almost linearly with the number of rows in my source table. This
results into too many connections being opened and at some point no more
Extractor instances can be created because the DBMS is drained out of all
its available connection limit. This is becoming really painful.


I dug into the code, and found that I could use the following settings to
limit the number of extractors created simultaenously:

extract.limit.enabled=true
extract.limit.type=pool
extract.limit.pool.size=10

However, TaskContext.java has this check:

        Limiter limiter = DefaultLimiterFactory.newLimiter(this.taskState);
        if (!(limiter instanceof NonRefillableLimiter)) {
          throw new IllegalArgumentException("The Limiter used with an
Extractor should be an instance of "
              + NonRefillableLimiter.class.getSimpleName());
        }

This was the end of my hope. I absolutely need to use PoolBasedLimiter
which is not a NonRefillableLimiter. How can I get around this problem?

*A relevent thread I found on github*:
https://github.com/apache/incubator-gobblin/pull/132
I understand that the above-mentioned check was added since
the LimitingExtractorDecorator doesn't close the limiters. But why?

Regards,
Gurjinder

Problem scaling Extractors with data volume

Reply via email to