[jira] [Commented] (IMPALA-6729) Provide startup option to disable file and block location cache

Quanlong Huang (JIRA) Thu, 03 May 2018 21:42:21 -0700

    [ 
https://issues.apache.org/jira/browse/IMPALA-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463360#comment-16463360
 ]


Quanlong Huang commented on IMPALA-6729:
----------------------------------------

[~tianyiwang], thanks for your exact summary!

Let me list some experiment results I've tested on Impala-2.12.0-rc1. Thanks to 
the new catalogd web UI, I can get the memory consumption of a table's 
metadata. I just uploaded a snapshot after two huge tables are loaded (see 
attachements). The queries both encountered OutOfMemoryError. The 
upp_published_prod table and upp_generated_prod table both consume ~10GB 
memory. It's useless to cache such much memory. The catalogd finally failed to 
send out the catalog updates. Here is the log:
{code:java}
I0503 19:29:42.871878 32749 TableLoader.java:58] Loading metadata for: 
default.upp_raw_prod
I0503 19:29:42.872372 32748 TableLoadingMgr.java:70] Loading metadata for 
table: default.upp_raw_prod
I0503 19:29:42.872709 32748 TableLoadingMgr.java:72] Remaining items in queue: 
0. Loads in progress: 1
I0503 19:29:43.332053 32749 HdfsTable.java:1276] Fetching partition metadata 
from the Metastore: default.upp_raw_prod
W0503 19:29:43.493882 32749 HiveConf.java:2897] HiveConf of name 
hive.access.conf.url does not exist
I0503 19:29:43.736603 32749 HdfsTable.java:1280] Fetched partition metadata 
from the Metastore: default.upp_raw_prod
I0503 19:29:45.541966 32749 HdfsTable.java:902] Loading file and block metadata 
for 794 paths for table default.upp_raw_prod using a thread pool of size 5
I0503 19:30:20.067162 32749 HdfsTable.java:942] Loaded file and block metadata 
for default.upp_raw_prod
I0503 19:30:20.071476 32749 TableLoader.java:97] Loaded metadata for: 
default.upp_raw_prod
I0503 19:30:22.708765 31655 catalog-server.cc:479] Collected update: 
TABLE:default.upp_raw_prod, version=17937, original size=94545234, compressed 
size=25210956
I0503 19:30:22.713205 31655 catalog-server.cc:479] Collected update: 
CATALOG_SERVICE_ID, version=17937, original size=49, compressed size=52
I0503 19:30:22.804253 31660 catalog-server.cc:243] A catalog update with 2 
entries is assembled. Catalog version: 17937 Last sent catalog version: 17936
I0503 19:30:38.328853 32748 TableLoadingMgr.java:70] Loading metadata for 
table: default.upp_generated_prod
I0503 19:30:38.329080 33260 TableLoader.java:58] Loading metadata for: 
default.upp_generated_prod
I0503 19:30:38.329207 32748 TableLoadingMgr.java:72] Remaining items in queue: 
0. Loads in progress: 1
I0503 19:30:38.586901 33260 HdfsTable.java:1276] Fetching partition metadata 
from the Metastore: default.upp_generated_prod
I0503 19:31:03.355075 33260 HdfsTable.java:1280] Fetched partition metadata 
from the Metastore: default.upp_generated_prod
I0503 19:32:59.218356 33260 HdfsTable.java:902] Loading file and block metadata 
for 104474 paths for table default.upp_generated_prod using a thread pool of 
size 5
I0503 19:43:02.928905 42338 webserver.cc:361] Webserver: error reading: 
Resource temporarily unavailable
I0503 19:43:42.053800 33260 HdfsTable.java:942] Loaded file and block metadata 
for default.upp_generated_prod
I0503 19:43:42.054765 33260 TableLoader.java:97] Loaded metadata for: 
default.upp_generated_prod
I0503 19:43:51.832700 32748 jni-util.cc:230] java.lang.OutOfMemoryError
        at 
java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
        at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
        at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
        at 
org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145)
        at 
org.apache.thrift.protocol.TBinaryProtocol.writeBinary(TBinaryProtocol.java:211)
        at 
org.apache.impala.thrift.THdfsFileDesc$THdfsFileDescStandardScheme.write(THdfsFileDesc.java:366)
        at 
org.apache.impala.thrift.THdfsFileDesc$THdfsFileDescStandardScheme.write(THdfsFileDesc.java:329)
        at org.apache.impala.thrift.THdfsFileDesc.write(THdfsFileDesc.java:280)
        at 
org.apache.impala.thrift.THdfsPartition$THdfsPartitionStandardScheme.write(THdfsPartition.java:2044)
        at 
org.apache.impala.thrift.THdfsPartition$THdfsPartitionStandardScheme.write(THdfsPartition.java:1777)
        at 
org.apache.impala.thrift.THdfsPartition.write(THdfsPartition.java:1602)
        at 
org.apache.impala.thrift.THdfsTable$THdfsTableStandardScheme.write(THdfsTable.java:1243)
        at 
org.apache.impala.thrift.THdfsTable$THdfsTableStandardScheme.write(THdfsTable.java:1071)
        at org.apache.impala.thrift.THdfsTable.write(THdfsTable.java:940)
        at 
org.apache.impala.thrift.TTable$TTableStandardScheme.write(TTable.java:1628)
        at 
org.apache.impala.thrift.TTable$TTableStandardScheme.write(TTable.java:1399)
        at org.apache.impala.thrift.TTable.write(TTable.java:1208)
        at 
org.apache.impala.thrift.TCatalogObject$TCatalogObjectStandardScheme.write(TCatalogObject.java:1241)
        at 
org.apache.impala.thrift.TCatalogObject$TCatalogObjectStandardScheme.write(TCatalogObject.java:1098)
        at 
org.apache.impala.thrift.TCatalogObject.write(TCatalogObject.java:938)
        at 
org.apache.impala.thrift.TCatalogUpdateResult$TCatalogUpdateResultStandardScheme.write(TCatalogUpdateResult.java:904)
        at 
org.apache.impala.thrift.TCatalogUpdateResult$TCatalogUpdateResultStandardScheme.write(TCatalogUpdateResult.java:776)
        at 
org.apache.impala.thrift.TCatalogUpdateResult.write(TCatalogUpdateResult.java:678)
        at 
org.apache.impala.thrift.TResetMetadataResponse$TResetMetadataResponseStandardScheme.write(TResetMetadataResponse.java:359)
        at 
org.apache.impala.thrift.TResetMetadataResponse$TResetMetadataResponseStandardScheme.write(TResetMetadataResponse.java:321)
        at 
org.apache.impala.thrift.TResetMetadataResponse.write(TResetMetadataResponse.java:269)
        at org.apache.thrift.TSerializer.serialize(TSerializer.java:79)
        at 
org.apache.impala.service.JniCatalog.resetMetadata(JniCatalog.java:160)
I0503 19:43:51.897089 32748 status.cc:125] OutOfMemoryError: null
    @          0x174fbb3  impala::Status::Status()
    @          0x1bba616  impala::JniUtil::GetJniExceptionMsg()
    @          0x17405aa  impala::JniUtil::CallJniMethod<>()
    @          0x173e3f7  impala::Catalog::ResetMetadata()
    @          0x170ac24  CatalogServiceThriftIf::ResetMetadata()
    @          0x177ae0c  
impala::CatalogServiceProcessor::process_ResetMetadata()
    @          0x1779c8a  impala::CatalogServiceProcessor::dispatchCall()
    @          0x16f45f0  apache::thrift::TDispatchProcessor::process()
    @          0x18cf84b  
apache::thrift::server::TAcceptQueueServer::Task::run()
    @          0x18c7319  impala::ThriftThread::RunRunnable()
    @          0x18c8a1d  boost::_mfi::mf2<>::operator()()
    @          0x18c88b3  boost::_bi::list3<>::operator()<>()
    @          0x18c85ff  boost::_bi::bind_t<>::operator()()
    @          0x18c8512  
boost::detail::function::void_function_obj_invoker0<>::invoke()
    @          0x190cf00  boost::function0<>::operator()()
    @          0x1c2710d  impala::Thread::SuperviseThread()
    @          0x1c2f5e3  boost::_bi::list5<>::operator()<>()
    @          0x1c2f507  boost::_bi::bind_t<>::operator()()
    @          0x1c2f4ca  boost::detail::thread_data<>::run()
    @          0x2ed6c3a  thread_proxy
    @     0x7efeb86e6184  start_thread
    @     0x7efeb8412ffd  clone
{code}
They're tables in the production environment. There're no small files since 
blocks number is much larger than files number. For example, here are the 
metrics of the upp_published_prod table:
{code:java}
memory-estimate-bytes: 10277300788
num-blocks: 13621520
num-files: 9098905
num-partitions: 1799131
total-file-size-bytes: 1479550865896345
{code}
[~alex.behm], from these I believe we really need an option to disable the file 
& block metadata cache. I'm trying to work on this if you don't have capacity, 
since this block us to widely use Impala in our environment.

Do you have any suggestions?

cc guys not in the watchers list. [[email protected]] [~jbapple]

> Provide startup option to disable file and block location cache
> ---------------------------------------------------------------
>
>                 Key: IMPALA-6729
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6729
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Priority: Major
>         Attachments: Screen Shot 2018-05-04 at 12.12.21 PM.png
>
>
> In HDFS, scheduling PlanFragments according to block locations can improve 
> the locality of queries. However, every coin has two sides. There’re some 
> scenarios that loading & keeping the block locations brings no benefits, 
> sometimes even becomes a burden.
> {panel:title=Scenario 1}
> In a Hadoop cluster with ~1000 nodes, Impala cluster is only deployed on tens 
> of computation nodes (i.e. with small disks but larger memory and powerful 
> CPUs). Data locality is poor since most of the blocks have no replicas in the 
> Impala nodes. Network bandwidth is 1Gbit/s so it’s ok for remote read. 
> Queries are only required to finish within 5 mins.
>  
> Block location info is useless since the scheduler always comes up with the 
> same plan.
> {panel}
> {panel:title=Scenario 2}
> load_catalog_in_background is set to false since there’re several PB of data 
> in hive warehouse. If it’s set to true, the Impala cluster won’t be able to 
> start up (will waiting for loading block locations and finally full fill the 
> memory of catalogd and crash it).
> Accessing a hive table containing >10,000 partitions at the first time will 
> be stuck for a long time. Sometimes it can’t even finish for some large 
> tables. Users are annoyed when they only want to describe the table or select 
> a few partitions on this table.
>  
> Block location info is a burden here since its loading dominates the query 
> time. Finally, only a little portion of the block location info can be used.
> {panel}
> {panel:title=Scenario 3}
> There’re many ETL pipelines ingesting data into Hive warehouse. Some tables 
> are updated by replacing the whole data set. Some partitioned tables are 
> updated by inserting new partitions.
> Ad hoc queries are used to be served by Presto. When trying to introduce 
> Impala to replace Presto, we should add a REFRESH table step at the end of 
> each pipeline, which takes great efforts (many code changes on the existing 
> warehouse).
> IMPALA-4272 can solve this but has no progress. If file and block location 
> metadata cache can be disabled, things will be simple.
> {panel}
> IMPALA-3127 is relative. But we hope it's possible to not keep the block 
> locations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-6729) Provide startup option to disable file and block location cache

Reply via email to