Re: big issue on retrieving 400MB data

Jeszy Fri, 28 Apr 2017 07:27:03 -0700

Hey,

It looks like all the time is spent waiting for the client to fetch the results:
     - ClientFetchWaitTimer: 17m31s


Try doing:
impala-shell -B -q '<query>'

HTH

2017-04-28 14:51 GMT+02:00 吴朱华 <[email protected]>:
> Maybe I just paste some main thing on mail , and congratulation on IPO
> thing.
>
> Unregister query: 17m42s (17m42s)
>
> Fetched 317246 row(s) in 1062.84s
> Query Runtime Profile:
> Query (id=8149e2439f43b15a:f08e570d7fbf1085):
>   Summary:
>     Session ID: 35436d1112b79287:9045c79c795858a5
>     Session Type: BEESWAX
>     Start Time: 2017-04-28 11:50:00.292615000
>     End Time: 2017-04-28 12:07:43.133484000
>     Query Type: QUERY
>     Query State: FINISHED
>     Query Status: OK
>     Impala Version: impalad version 2.5.0-cdh5-INTERNAL RELEASE (build
> 43880282edc04c03c162bbea6fc85b5388e7fdde)
>     User: impala
>     Connected User: impala
>     Delegated User:
>     Network Address: ::ffff:10.44.10.186:36325
>     Default Db: sjzy
>     Sql Statement: select
> MRECID,UNITID,PCQDM,PCQMC,PCXQDM,PCXQMC,DM,H001,H002,H003,H021,H022,H023,H024,H025,H026A,H026B,H026C,H026D,H026E,H026F,H026G,H027,H028,H029,H030,H031,H032,H033,H034,H035,H036,H037A,H037B,H037C,H038,H039,H040,H041,H042,H043A,H043B,H043C,H043D,H043E,H043F,H043G,H043H,H043I,H043J,H043K,H043L,H044A,H044B,H044C,H044D,H044E,H044F,H044G,H044H,H044I,H050,H051,H052,H053,H054,H055,H056,H061,H062,H063,H064,H065,H066,H070,H071,H072,H073,H074,H075,H080,H100,H111,H112,H113,H120,H200,H201,H202,H203,H204,H205,H206,H207,H208,H209,H210,H211,H300,H320,H321,H322,H323,H324,H400,H401,H402,H403,H404,H405,H406,H500,H600,H601,H602,H603,H604,H605,H606,H607,H608,H609,H610,H611,H612,H613,H614,H615,H616,H621A,H621B,H621C,H621D,H621E,H621F,H622A,H622B,H622C,H801,H802,H803,H804,H901,H902,H903
> FROM NP_2017_NP601 WHERE DS_AREACODE LIKE '445281%'
>     Coordinator: node1.sky.org:22000
>     Query Options (non default):
>     Plan:
> ----------------
> Estimated Per-Host Requirements: Memory=4.50GB VCores=1
>
> 01:EXCHANGE [UNPARTITIONED]
> |  hosts=4 per-host-mem=unavailable
> |  tuple-ids=0 row-size=1.67KB cardinality=1155911
> |
> 00:SCAN HDFS [sjzy.np_2017_np601, RANDOM]
>    partitions=1/1 files=20 size=1.06GB
>    predicates: DS_AREACODE LIKE '445281%'
>    table stats: 11559109 rows total
>    column stats: all
>    hosts=4 per-host-mem=4.50GB
>    tuple-ids=0 row-size=1.67KB cardinality=1155911
> ----------------
>     Estimated Per-Host Mem: 4831838208
>     Estimated Per-Host VCores: 1
>     Request Pool: default-pool
>     ExecSummary:
> Operator       #Hosts  Avg Time  Max Time    #Rows  Est. #Rows   Peak Mem
>  Est. Peak Mem  Detail
> -------------------------------------------------------------------------------------------------------------
> 01:EXCHANGE         1  32.314ms  32.314ms  317.25K       1.16M          0
>      -1.00 B  UNPARTITIONED
> 00:SCAN HDFS       20   1s137ms   1s348ms  317.25K       1.16M  163.85 MB
>      4.50 GB  sjzy.np_2017_np601
>     Planner Timeline: 53.683ms
>        - Analysis finished: 24.565ms (24.565ms)
>        - Equivalence classes computed: 26.389ms (1.823ms)
>        - Single node plan created: 33.607ms (7.218ms)
>        - Runtime filters computed: 33.684ms (76.568us)
>        - Distributed plan created: 39.125ms (5.441ms)
>        - Planning finished: 53.683ms (14.558ms)
>     Query Timeline: 17m42s
>        - Start execution: 43.792us (43.792us)
>        - Planning finished: 60.640ms (60.596ms)
>        - Ready to start 20 remote fragments: 65.111ms (4.471ms)
>        - All 20 remote fragments started: 74.572ms (9.461ms)
>        - Rows available: 744.300ms (669.728ms)
>        - First row fetched: 790.128ms (45.828ms)
>        - Unregister query: 17m42s (17m42s)
>   ImpalaServer:
>      - ClientFetchWaitTimer: 17m31s
>      - RowMaterializationTimer: 10s024ms
>
> 2017-04-28 19:44 GMT+08:00 Jim Apple <[email protected]>:
>
>> dev@ does not appear to accept attachments. You can upload it somewhere
>> and
>> post a link, though.
>>
>> On Thu, Apr 27, 2017 at 11:35 PM, 吴朱华 <[email protected]> wrote:
>>
>> > Oops, I just resend it, you know the chinese network^_^
>> >
>> > 2017-04-28 14:20 GMT+08:00 Mostafa Mokhtar <[email protected]>:
>> >
>> >> Btw the profile wasn't attached.
>> >> Please resend.
>> >>
>> >> On Thu, Apr 27, 2017 at 11:11 PM, 吴朱华 <[email protected]> wrote:
>> >>
>> >>> Profile is in the attachment, thanks
>> >>>
>> >>>
>> >>> 2017-04-28 13:10 GMT+08:00 Dimitris Tsirogiannis <
>> >>> [email protected]>:
>> >>>
>> >>>> Maybe you also want to post some information about the schema (how
>> wide
>> >>>> your table is, does it use nested types, etc) as well as the profile
>> of
>> >>>> the
>> >>>> slow query.
>> >>>>
>> >>>> Dimitris
>> >>>>
>> >>>> On Thu, Apr 27, 2017 at 9:30 PM, 吴朱华 <[email protected]> wrote:
>> >>>>
>> >>>> > Hi guys:
>> >>>> > we can facing a big issue when select * from a big table.
>> >>>> > The performance is 17minutes for retrieving 400MB data. Even slow
>> >>>> under
>> >>>> > JDBC situation.
>> >>>> > Is there anyway to improve it?^_^
>> >>>> >
>> >>>>
>> >>>
>> >>>
>> >>
>> >
>>

Re: big issue on retrieving 400MB data

Reply via email to