Also on your location clause you should not reference the same file more
than one time.

If you want to scale gpfdist process you need to use different range on
each port for a same server. (Not sure if I explain myself :))

If you use one gpfdist per server performance should be fine. One gpfdist
process usually can do 250MB/s if underlying infra allows.

Rgds

--- Sent from my Nexus 5x

Em 21 de set de 2016 12:14 PM, "Vineet Goel" <vvin...@apache.org> escreveu:

> Your default_hash_table_bucket_number value is set to 6. Typically, this
> should be adjusted as 6 x #_of_your_segment_hosts. With 3 segments, you
> should set this value to 18. Any time you change this parameter, you should
> redistribute your HASH distributed tables, if you have any (unless the
> table DDL has # of buckets defined, I think).
>
> Increase default_hash_table_bucket_number to 18 and retry the insert.
> Since you have 18 ext table URLs, it should work with the change.
>
> Thanks
> Vineet
>
>
> On Wed, Sep 21, 2016 at 12:26 AM 来熊 <yin....@163.com> wrote:
>
>>
>> My environment is >>>> : 1 master 3 segments
>> SQL >>>>>>:
>> CREATE TABLE call_center (
>>     cc_call_center_sk integer,
>>     cc_call_center_id character varying(16),
>>     cc_rec_start_date date,
>>     cc_rec_end_date date,
>>     cc_closed_date_sk integer,
>>     cc_open_date_sk integer,
>>     cc_name character varying(50),
>>     cc_class character varying(50),
>>     cc_employees integer,
>>     cc_sq_ft integer,
>>     cc_hours character varying(20),
>>     cc_manager character varying(40),
>>     cc_mkt_id integer,
>>     cc_mkt_class character varying(50),
>>     cc_mkt_desc character varying(100),
>>     cc_market_manager character varying(40),
>>     cc_division text,
>>     cc_division_name character varying(50),
>>     cc_company text,
>>     cc_company_name character varying(50),
>>     cc_street_number character varying(10),
>>     cc_street_name character varying(60),
>>     cc_street_type character varying(15),
>>     cc_suite_number character varying(10),
>>     cc_city character varying(60),
>>     cc_county character varying(30),
>>     cc_state text,
>>     cc_zip character varying(10),
>>     cc_country character varying(20),
>>     cc_gmt_offset numeric(5,2),
>>     cc_tax_percentage numeric(5,2)
>> )
>> WITH (appendonly=true, orientation=parquet)
>> DISTRIBUTED BY (cc_call_center_sk);
>>
>> CREATE EXTERNAL TABLE ext_call_center (like call_center)
>> LOCATION (
>> 'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat')
>> FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\');
>>
>> insert into call_center select * from ext_call_center;
>>
>> ERROR:  External scan error: There are more external files (URLs) than
>> primary segments that can read them (COptTasks.cpp:1756)
>>
>> hawq config >>>>:
>>
>> GUC : add_missing_from Value : off
>> GUC : application_name Value :
>> GUC : array_nulls Value : on
>> GUC : authentication_timeout Value : 60
>> GUC : backslash_quote Value : safe_encoding
>> GUC : block_size Value : 32768
>> GUC : bonjour_name Value :
>> GUC : check_function_bodies Value : on
>> GUC : client_encoding Value : UTF8
>> GUC : client_min_messages Value : ERROR
>> GUC : cpu_index_tuple_cost Value : 0.005
>> GUC : cpu_operator_cost Value : 0.0025
>> GUC : cpu_tuple_cost Value : 0.01
>> GUC : cursor_tuple_fraction Value : 1
>> GUC : custom_variable_classes Value :
>> GUC : DateStyle Value : ISO, MDY
>> GUC : db_user_namespace Value : off
>> GUC : deadlock_timeout Value : 1000
>> GUC : debug_assertions Value : off
>> GUC : debug_pretty_print Value : off
>> GUC : debug_print_parse Value : off
>> GUC : debug_print_plan Value : off
>> GUC : debug_print_prelim_plan Value : off
>> GUC : debug_print_rewritten Value : off
>> GUC : debug_print_slice_table Value : off
>> GUC : default_hash_table_bucket_number Value : 6
>> GUC : default_statement_mem Value : 128000
>> GUC : default_statistics_target Value : 25
>> GUC : default_tablespace Value :
>> GUC : default_transaction_isolation Value : read committed
>> GUC : default_transaction_read_only Value : off
>> GUC : dfs_url Value : localhost:8020/hawq
>> GUC : dynamic_library_path Value : $libdir
>> GUC : effective_cache_size Value : 16384
>> GUC : enable_bitmapscan Value : on
>> GUC : enable_groupagg Value : on
>> GUC : enable_hashagg Value : on
>> GUC : enable_hashjoin Value : on
>> GUC : enable_indexscan Value : on
>> GUC : enable_mergejoin Value : off
>> GUC : enable_nestloop Value : off
>> GUC : enable_secure_filesystem Value : off
>> GUC : enable_seqscan Value : on
>> GUC : enable_sort Value : on
>> GUC : enable_tidscan Value : on
>> GUC : escape_string_warning Value : on
>> GUC : explain_memory_verbosity Value : suppress
>> GUC : explain_pretty_print Value : on
>> GUC : extra_float_digits Value : 0
>> GUC : filesystem_support_truncate Value : on
>> GUC : from_collapse_limit Value : 20
>> GUC : gp_adjust_selectivity_for_outerjoins Value : on
>> GUC : gp_analyze_relative_error Value : 0.25
>> GUC : gp_autostats_mode Value : NONE
>> GUC : gp_autostats_on_change_threshold Value : 2147483647
>> GUC : gp_backup_directIO Value : off
>> GUC : gp_backup_directIO_read_chunk_mb Value : 20
>> GUC : gp_cached_segworkers_threshold Value : 5
>> GUC : gp_command_count Value : 6
>> GUC : gp_connections_per_thread Value : 64
>> GUC : gp_debug_linger Value : 0
>> GUC : gp_dynamic_partition_pruning Value : on
>> GUC : gp_email_connect_avoid_duration Value : 7200
>> GUC : gp_email_connect_failures Value : 5
>> GUC : gp_email_connect_timeout Value : 15
>> GUC : gp_email_from Value :
>> GUC : gp_email_smtp_password Value :
>> GUC : gp_email_smtp_server Value : localhost:25
>> GUC : gp_email_smtp_userid Value :
>> GUC : gp_email_to Value :
>> GUC : gp_enable_agg_distinct Value : on
>> GUC : gp_enable_agg_distinct_pruning Value : on
>> GUC : gp_enable_direct_dispatch Value : on
>> GUC : gp_enable_fallback_plan Value : on
>> GUC : gp_enable_fast_sri Value : on
>> GUC : gp_enable_gpperfmon Value : off
>> GUC : gp_enable_groupext_distinct_gather Value : on
>> GUC : gp_enable_groupext_distinct_pruning Value : on
>> GUC : gp_enable_multiphase_agg Value : on
>> GUC : gp_enable_predicate_propagation Value : on
>> GUC : gp_enable_preunique Value : on
>> GUC : gp_enable_sequential_window_plans Value : on
>> GUC : gp_enable_sort_distinct Value : on
>> GUC : gp_enable_sort_limit Value : on
>> GUC : gp_external_enable_exec Value : on
>> GUC : gp_external_grant_privileges Value : off
>> GUC : gp_external_max_segs Value : 64
>> GUC : gp_filerep_tcp_keepalives_count Value : 2
>> GUC : gp_filerep_tcp_keepalives_idle Value : 60
>> GUC : gp_filerep_tcp_keepalives_interval Value : 30
>> GUC : gp_force_use_default_temporary_directory Value : off
>> GUC : gp_gpperfmon_send_interval Value : 1
>> GUC : gp_hashjoin_tuples_per_bucket Value : 5
>> GUC : gp_idf_deduplicate Value : auto
>> GUC : gp_interconnect_cache_future_packets Value : on
>> GUC : gp_interconnect_default_rtt Value : 20
>> GUC : gp_interconnect_fc_method Value : LOSS
>> GUC : gp_interconnect_hash_multiplier Value : 2
>> GUC : gp_interconnect_min_retries_before_timeout Value : 100
>> GUC : gp_interconnect_min_rto Value : 20
>> GUC : gp_interconnect_queue_depth Value : 4
>> GUC : gp_interconnect_setup_timeout Value : 7200
>> GUC : gp_interconnect_snd_queue_depth Value : 2
>> GUC : gp_interconnect_timer_checking_period Value : 20
>> GUC : gp_interconnect_timer_period Value : 5
>> GUC : gp_interconnect_transmit_timeout Value : 3600
>> GUC : gp_interconnect_type Value : UDP
>> GUC : gp_log_format Value : csv
>> GUC : gp_max_csv_line_length Value : 1048576
>> GUC : gp_max_databases Value : 16
>> GUC : gp_max_filespaces Value : 8
>> GUC : gp_max_packet_size Value : 8192
>> GUC : gp_max_partition_level Value : 0
>> GUC : gp_max_plan_size Value : 0
>> GUC : gp_max_relations Value : 65536
>> GUC : gp_max_tablespaces Value : 16
>> GUC : gp_motion_cost_per_row Value : 0
>> GUC : gp_num_contents_in_cluster Value : -10000
>> GUC : gp_query_context_mem_limit Value : 102400
>> GUC : gp_reject_percent_threshold Value : 300
>> GUC : gp_reraise_signal Value : on
>> GUC : gp_role Value : utility
>> GUC : gp_safefswritesize Value : 0
>> GUC : gp_segment_connect_timeout Value : 600
>> GUC : gp_segments_for_planner Value : 0
>> GUC : gp_session_id Value : -1
>> GUC : gp_set_proc_affinity Value : off
>> GUC : gp_snmp_community Value : public
>> GUC : gp_snmp_monitor_address Value :
>> GUC : gp_snmp_use_inform_or_trap Value : trap
>> GUC : gp_statistics_pullup_from_child_partition Value : on
>> GUC : gp_statistics_use_fkeys Value : on
>> GUC : gp_subtrans_warn_limit Value : 16777216
>> GUC : gp_temporary_directory_mark_error Value : 0
>> GUC : gp_udp_bufsize_k Value : 0
>> GUC : gp_vmem_protect_segworker_cache_limit Value : 500
>> GUC : gp_workfile_checksumming Value : on
>> GUC : gp_workfile_compress_algorithm Value : none
>> GUC : gp_workfile_limit_per_query Value : 0
>> GUC : gp_workfile_limit_per_segment Value : 0
>> GUC : gpperfmon_port Value : 8888
>> GUC : hawq_global_rm_type Value : yarn
>> GUC : hawq_master_address_host Value : master.bigdata
>> GUC : hawq_master_address_port Value : 6432
>> GUC : hawq_master_temp_directory Value : /tmp
>> GUC : hawq_metadata_cache_block_capacity Value : 2097152
>> GUC : hawq_metadata_cache_check_interval Value : 30
>> GUC : hawq_metadata_cache_flush_ratio Value : 0.85
>> GUC : hawq_metadata_cache_free_block_max_ratio Value : 0.05
>> GUC : hawq_metadata_cache_free_block_normal_ratio Value : 0.2
>> GUC : hawq_metadata_cache_max_hdfs_file_num Value : 524288
>> GUC : hawq_metadata_cache_reduce_ratio Value : 0.7
>> GUC : hawq_metadata_cache_refresh_interval Value : 3600
>> GUC : hawq_metadata_cache_refresh_max_num Value : 1000
>> GUC : hawq_metadata_cache_refresh_timeout Value : 3600
>> GUC : hawq_re_cgroup_hierarchy_name Value : hadoop-yarn
>> GUC : hawq_re_cgroup_mount_point Value : /sys/fs/cgroup
>> GUC : hawq_re_cpu_enable Value : off
>> GUC : hawq_re_cpu_weight Value : 1024
>> GUC : hawq_re_memory_overcommit_max Value : 8192
>> GUC : hawq_re_vcore_pcore_ratio Value : 1
>> GUC : hawq_rm_cluster_report_period Value : 60
>> GUC : hawq_rm_clusterratio_core_to_memorygb_factor Value : 5
>> GUC : hawq_rm_connpool_sameaddr_buffersize Value : 2
>> GUC : hawq_rm_container_batch_limit Value : 1000
>> GUC : hawq_rm_enable_connpool Value : on
>> GUC : hawq_rm_force_alterqueue_cancel_queued_request Value : on
>> GUC : hawq_rm_force_fifo_queuing Value : on
>> GUC : hawq_rm_master_domain_port Value : 5436
>> GUC : hawq_rm_master_port Value : 5437
>> GUC : hawq_rm_memory_limit_perseg Value : 480GB
>> GUC : hawq_rm_min_resource_perseg Value : 2
>> GUC : hawq_rm_nocluster_timeout Value : 60
>> GUC : hawq_rm_nresqueue_limit Value : 128
>> GUC : hawq_rm_nslice_perseg_limit Value : 5000
>> GUC : hawq_rm_nvcore_limit_perseg Value : 16
>> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_limit Value : 512
>> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_perseg_limit Value : 8
>> GUC : hawq_rm_nvseg_for_analyze_part_perquery_limit Value : 256
>> GUC : hawq_rm_nvseg_for_analyze_part_perquery_perseg_limit Value : 4
>> GUC : hawq_rm_nvseg_for_copy_from_perquery Value : 6
>> GUC : hawq_rm_nvseg_perquery_limit Value : 512
>> GUC : hawq_rm_nvseg_perquery_perseg_limit Value : 6
>> GUC : hawq_rm_nvseg_variance_amon_seg_limit Value : 1
>> GUC : hawq_rm_nvseg_variance_amon_seg_respool_limit Value : 2
>> GUC : hawq_rm_regularize_io_factor Value : 1
>> GUC : hawq_rm_regularize_io_max Value : 1.37439e+11
>> GUC : hawq_rm_regularize_nvseg_factor Value : 1
>> GUC : hawq_rm_regularize_nvseg_max Value : 300
>> GUC : hawq_rm_regularize_usage_factor Value : 1
>> GUC : hawq_rm_rejectrequest_nseg_limit Value : 0.25
>> GUC : hawq_rm_request_timeoutcheck_interval Value : 1
>> GUC : hawq_rm_resource_allocation_timeout Value : 600
>> GUC : hawq_rm_resource_idle_timeout Value : 300
>> GUC : hawq_rm_respool_test_file Value :
>> GUC : hawq_rm_return_percent_on_overcommit Value : 10
>> GUC : hawq_rm_segment_config_refresh_interval Value : 30
>> GUC : hawq_rm_segment_heartbeat_interval Value : 30
>> GUC : hawq_rm_segment_heartbeat_timeout Value : 300
>> GUC : hawq_rm_segment_port Value : 5438
>> GUC : hawq_rm_segment_tmpdir_detect_interval Value : 300
>> GUC : hawq_rm_session_lease_heartbeat_enable Value : on
>> GUC : hawq_rm_session_lease_heartbeat_interval Value : 10
>> GUC : hawq_rm_session_lease_timeout Value : 180
>> GUC : hawq_rm_stmt_nvseg Value : 0
>> GUC : hawq_rm_stmt_vseg_memory Value : 128mb
>> GUC : hawq_rm_tolerate_nseg_limit Value : 0.25
>> GUC : hawq_rm_yarn_address Value : worker1.bigdata:8050
>> GUC : hawq_rm_yarn_app_name Value : hawq
>> GUC : hawq_rm_yarn_queue_name Value : default
>> GUC : hawq_rm_yarn_scheduler_address Value : worker1.bigdata:8030
>> GUC : hawq_segment_address_port Value : 40000
>> GUC : hawq_segment_history_keep_period Value : 365
>> GUC : hawq_segment_temp_directory Value : /tmp
>> GUC : integer_datetimes Value : on
>> GUC : IntervalStyle Value : postgres
>> GUC : join_collapse_limit Value : 20
>> GUC : krb5_ccname Value : /tmp/postgres.ccname
>> GUC : krb_caseins_users Value : off
>> GUC : krb_server_keyfile Value : FILE:/data/pulse2-agent/
>> agents/agent1/work/HAWQ-main-opt/rhel5_x86_64/src/hawq-db-
>> dist/etc/krb5.keytab
>> GUC : krb_srvname Value : postgres
>> GUC : lc_collate Value : C
>> GUC : lc_ctype Value : C
>> GUC : lc_messages Value : en_US.utf8
>> GUC : lc_monetary Value : en_US.utf8
>> GUC : lc_numeric Value : en_US.utf8
>> GUC : lc_time Value : en_US.utf8
>> GUC : listen_addresses Value : *
>> GUC : local_preload_libraries Value :
>> GUC : log_autostats Value : off
>> GUC : log_connections Value : off
>> GUC : log_disconnections Value : off
>> GUC : log_dispatch_stats Value : off
>> GUC : log_duration Value : off
>> GUC : log_error_verbosity Value : default
>> GUC : log_executor_stats Value : off
>> GUC : log_hostname Value : off
>> GUC : log_min_duration_statement Value : -1
>> GUC : log_min_error_statement Value : error
>> GUC : log_min_messages Value : warning
>> GUC : log_parser_stats Value : off
>> GUC : log_planner_stats Value : off
>> GUC : log_rotation_age Value : 1440
>> GUC : log_rotation_size Value : 0
>> GUC : log_statement Value : none
>> GUC : log_statement_stats Value : off
>> GUC : log_timezone Value : PRC
>> GUC : log_truncate_on_rotation Value : off
>> GUC : maintenance_work_mem Value : 65536
>> GUC : master_directory Value :
>> GUC : max_appendonly_segfiles Value : 262144
>> GUC : max_appendonly_tables Value : 10000
>> GUC : max_connections Value : 1280
>> GUC : max_files_per_process Value : 150
>> GUC : max_fsm_pages Value : 200000
>> GUC : max_fsm_relations Value : 1000
>> GUC : max_function_args Value : 100
>> GUC : max_identifier_length Value : 63
>> GUC : max_index_keys Value : 32
>> GUC : max_locks_per_transaction Value : 128
>> GUC : max_prepared_transactions Value : 250
>> GUC : max_stack_depth Value : 2048
>> GUC : max_work_mem Value : 1024000
>> GUC : metadata_cache_testfile Value :
>> GUC : optimizer Value : on
>> GUC : optimizer_analyze_root_partition Value : on
>> GUC : optimizer_minidump Value : onerror
>> GUC : optimizer_parts_to_force_sort_on_insert Value : 160
>> GUC : password_encryption Value : on
>> GUC : password_hash_algorithm Value : MD5
>> GUC : pljava_classpath Value :
>> GUC : pljava_release_lingering_savepoints Value : off
>> GUC : pljava_statement_cache_size Value : 0
>> GUC : pljava_vmoptions Value :
>> GUC : port Value : 6432
>> GUC : pxf_enable_filter_pushdown Value : on
>> GUC : pxf_enable_locality_optimizations Value : on
>> GUC : pxf_enable_stat_collection Value : on
>> GUC : pxf_remote_service_login Value :
>> GUC : pxf_remote_service_secret Value :
>> GUC : pxf_service_address Value : localhost:51200
>> GUC : pxf_stat_max_fragments Value : 100
>> GUC : random_page_cost Value : 100
>> GUC : regex_flavor Value : advanced
>> GUC : runaway_detector_activation_percent Value : 95
>> GUC : search_path Value : "$user",public
>> GUC : seg_max_connections Value : 3000
>> GUC : segment_directory Value :
>> GUC : seq_page_cost Value : 1
>> GUC : server_encoding Value : UTF8
>> GUC : server_ticket_renew_interval Value : 43200000
>> GUC : server_version Value : 8.2.15
>> GUC : server_version_num Value : 80215
>> GUC : shared_buffers Value : 4000
>> GUC : shared_preload_libraries Value :
>> GUC : ssl Value : off
>> GUC : ssl_ciphers Value : ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
>> GUC : standard_conforming_strings Value : off
>> GUC : standby_address_host Value : localhost
>> GUC : statement_timeout Value : 0
>> GUC : superuser_reserved_connections Value : 3
>> GUC : tcp_keepalives_count Value : 9
>> GUC : tcp_keepalives_idle Value : 7200
>> GUC : tcp_keepalives_interval Value : 75
>> GUC : temp_buffers Value : 1024
>> GUC : TimeZone Value : PRC
>> GUC : timezone_abbreviations Value : Default
>> GUC : track_activities Value : on
>> GUC : track_counts Value : off
>> GUC : transaction_isolation Value : read committed
>> GUC : transaction_read_only Value : off
>> GUC : transform_null_equals Value : off
>> GUC : unix_socket_directory Value :
>> GUC : unix_socket_group Value :
>> GUC : unix_socket_permissions Value : 511
>> GUC : update_process_title Value : on
>> GUC : vacuum_cost_delay Value : 0
>> GUC : vacuum_cost_limit Value : 200
>> GUC : vacuum_cost_page_dirty Value : 20
>> GUC : vacuum_cost_page_miss Value : 10
>> GUC : vacuum_freeze_min_age Value : 100000000
>> GUC : work_mem Value : 51200
>>
>>
>>
>>
>>
>> At 2016-09-21 13:00:41, "Vineet Goel" <vvin...@apache.org> wrote:
>>
>> Could you please post your SQL DDL statement? How many URLs do you have
>> in your external table? Also, your HASH dist table - how many buckets are
>> defined, if any? Are the # of URLs more than the # of buckets or
>> default_hash_table_bucket_number value? Perhaps you can attach your
>> hawq-site.xml file as well.
>>
>> Also see:
>> http://hdb.docs.pivotal.io/20/datamgmt/load/g-gpfdist-protocol.html
>>
>> Thanks
>> Vineet
>>
>>
>> On Tue, Sep 20, 2016 at 7:07 PM 来熊 <yin....@163.com> wrote:
>>
>>> Hi,all:
>>>     I am testing hawq 2.0.0 , and I find a problem like this:
>>>  I load data from an external table (created using "like target_table"
>>> statement) ,
>>> if the target table was distributed by some column(s), it raise this
>>> error:
>>>  External scan error: There are more external files (URLs) than primary
>>> segments that can read them (COptTasks.cpp:1756)
>>> if the target table was distributed randomly, it works well,
>>> I don't set any parameter special,does anybody know how to resolve this
>>> problem?
>>> thanks a lot.
>>>
>>

Reply via email to