Your default_hash_table_bucket_number value is set to 6. Typically, this should be adjusted as 6 x #_of_your_segment_hosts. With 3 segments, you should set this value to 18. Any time you change this parameter, you should redistribute your HASH distributed tables, if you have any (unless the table DDL has # of buckets defined, I think).
Increase default_hash_table_bucket_number to 18 and retry the insert. Since you have 18 ext table URLs, it should work with the change. Thanks Vineet On Wed, Sep 21, 2016 at 12:26 AM 来熊 <[email protected]> wrote: > > My environment is >>>> : 1 master 3 segments > SQL >>>>>>: > CREATE TABLE call_center ( > cc_call_center_sk integer, > cc_call_center_id character varying(16), > cc_rec_start_date date, > cc_rec_end_date date, > cc_closed_date_sk integer, > cc_open_date_sk integer, > cc_name character varying(50), > cc_class character varying(50), > cc_employees integer, > cc_sq_ft integer, > cc_hours character varying(20), > cc_manager character varying(40), > cc_mkt_id integer, > cc_mkt_class character varying(50), > cc_mkt_desc character varying(100), > cc_market_manager character varying(40), > cc_division text, > cc_division_name character varying(50), > cc_company text, > cc_company_name character varying(50), > cc_street_number character varying(10), > cc_street_name character varying(60), > cc_street_type character varying(15), > cc_suite_number character varying(10), > cc_city character varying(60), > cc_county character varying(30), > cc_state text, > cc_zip character varying(10), > cc_country character varying(20), > cc_gmt_offset numeric(5,2), > cc_tax_percentage numeric(5,2) > ) > WITH (appendonly=true, orientation=parquet) > DISTRIBUTED BY (cc_call_center_sk); > > CREATE EXTERNAL TABLE ext_call_center (like call_center) > LOCATION ( > 'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat') > FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\'); > > insert into call_center select * from ext_call_center; > > ERROR: External scan error: There are more external files (URLs) than > primary segments that can read them (COptTasks.cpp:1756) > > hawq config >>>>: > > GUC : add_missing_from Value : off > GUC : application_name Value : > GUC : array_nulls Value : on > GUC : authentication_timeout Value : 60 > GUC : backslash_quote Value : safe_encoding > GUC : block_size Value : 32768 > GUC : bonjour_name Value : > GUC : check_function_bodies Value : on > GUC : client_encoding Value : UTF8 > GUC : client_min_messages Value : ERROR > GUC : cpu_index_tuple_cost Value : 0.005 > GUC : cpu_operator_cost Value : 0.0025 > GUC : cpu_tuple_cost Value : 0.01 > GUC : cursor_tuple_fraction Value : 1 > GUC : custom_variable_classes Value : > GUC : DateStyle Value : ISO, MDY > GUC : db_user_namespace Value : off > GUC : deadlock_timeout Value : 1000 > GUC : debug_assertions Value : off > GUC : debug_pretty_print Value : off > GUC : debug_print_parse Value : off > GUC : debug_print_plan Value : off > GUC : debug_print_prelim_plan Value : off > GUC : debug_print_rewritten Value : off > GUC : debug_print_slice_table Value : off > GUC : default_hash_table_bucket_number Value : 6 > GUC : default_statement_mem Value : 128000 > GUC : default_statistics_target Value : 25 > GUC : default_tablespace Value : > GUC : default_transaction_isolation Value : read committed > GUC : default_transaction_read_only Value : off > GUC : dfs_url Value : localhost:8020/hawq > GUC : dynamic_library_path Value : $libdir > GUC : effective_cache_size Value : 16384 > GUC : enable_bitmapscan Value : on > GUC : enable_groupagg Value : on > GUC : enable_hashagg Value : on > GUC : enable_hashjoin Value : on > GUC : enable_indexscan Value : on > GUC : enable_mergejoin Value : off > GUC : enable_nestloop Value : off > GUC : enable_secure_filesystem Value : off > GUC : enable_seqscan Value : on > GUC : enable_sort Value : on > GUC : enable_tidscan Value : on > GUC : escape_string_warning Value : on > GUC : explain_memory_verbosity Value : suppress > GUC : explain_pretty_print Value : on > GUC : extra_float_digits Value : 0 > GUC : filesystem_support_truncate Value : on > GUC : from_collapse_limit Value : 20 > GUC : gp_adjust_selectivity_for_outerjoins Value : on > GUC : gp_analyze_relative_error Value : 0.25 > GUC : gp_autostats_mode Value : NONE > GUC : gp_autostats_on_change_threshold Value : 2147483647 > GUC : gp_backup_directIO Value : off > GUC : gp_backup_directIO_read_chunk_mb Value : 20 > GUC : gp_cached_segworkers_threshold Value : 5 > GUC : gp_command_count Value : 6 > GUC : gp_connections_per_thread Value : 64 > GUC : gp_debug_linger Value : 0 > GUC : gp_dynamic_partition_pruning Value : on > GUC : gp_email_connect_avoid_duration Value : 7200 > GUC : gp_email_connect_failures Value : 5 > GUC : gp_email_connect_timeout Value : 15 > GUC : gp_email_from Value : > GUC : gp_email_smtp_password Value : > GUC : gp_email_smtp_server Value : localhost:25 > GUC : gp_email_smtp_userid Value : > GUC : gp_email_to Value : > GUC : gp_enable_agg_distinct Value : on > GUC : gp_enable_agg_distinct_pruning Value : on > GUC : gp_enable_direct_dispatch Value : on > GUC : gp_enable_fallback_plan Value : on > GUC : gp_enable_fast_sri Value : on > GUC : gp_enable_gpperfmon Value : off > GUC : gp_enable_groupext_distinct_gather Value : on > GUC : gp_enable_groupext_distinct_pruning Value : on > GUC : gp_enable_multiphase_agg Value : on > GUC : gp_enable_predicate_propagation Value : on > GUC : gp_enable_preunique Value : on > GUC : gp_enable_sequential_window_plans Value : on > GUC : gp_enable_sort_distinct Value : on > GUC : gp_enable_sort_limit Value : on > GUC : gp_external_enable_exec Value : on > GUC : gp_external_grant_privileges Value : off > GUC : gp_external_max_segs Value : 64 > GUC : gp_filerep_tcp_keepalives_count Value : 2 > GUC : gp_filerep_tcp_keepalives_idle Value : 60 > GUC : gp_filerep_tcp_keepalives_interval Value : 30 > GUC : gp_force_use_default_temporary_directory Value : off > GUC : gp_gpperfmon_send_interval Value : 1 > GUC : gp_hashjoin_tuples_per_bucket Value : 5 > GUC : gp_idf_deduplicate Value : auto > GUC : gp_interconnect_cache_future_packets Value : on > GUC : gp_interconnect_default_rtt Value : 20 > GUC : gp_interconnect_fc_method Value : LOSS > GUC : gp_interconnect_hash_multiplier Value : 2 > GUC : gp_interconnect_min_retries_before_timeout Value : 100 > GUC : gp_interconnect_min_rto Value : 20 > GUC : gp_interconnect_queue_depth Value : 4 > GUC : gp_interconnect_setup_timeout Value : 7200 > GUC : gp_interconnect_snd_queue_depth Value : 2 > GUC : gp_interconnect_timer_checking_period Value : 20 > GUC : gp_interconnect_timer_period Value : 5 > GUC : gp_interconnect_transmit_timeout Value : 3600 > GUC : gp_interconnect_type Value : UDP > GUC : gp_log_format Value : csv > GUC : gp_max_csv_line_length Value : 1048576 > GUC : gp_max_databases Value : 16 > GUC : gp_max_filespaces Value : 8 > GUC : gp_max_packet_size Value : 8192 > GUC : gp_max_partition_level Value : 0 > GUC : gp_max_plan_size Value : 0 > GUC : gp_max_relations Value : 65536 > GUC : gp_max_tablespaces Value : 16 > GUC : gp_motion_cost_per_row Value : 0 > GUC : gp_num_contents_in_cluster Value : -10000 > GUC : gp_query_context_mem_limit Value : 102400 > GUC : gp_reject_percent_threshold Value : 300 > GUC : gp_reraise_signal Value : on > GUC : gp_role Value : utility > GUC : gp_safefswritesize Value : 0 > GUC : gp_segment_connect_timeout Value : 600 > GUC : gp_segments_for_planner Value : 0 > GUC : gp_session_id Value : -1 > GUC : gp_set_proc_affinity Value : off > GUC : gp_snmp_community Value : public > GUC : gp_snmp_monitor_address Value : > GUC : gp_snmp_use_inform_or_trap Value : trap > GUC : gp_statistics_pullup_from_child_partition Value : on > GUC : gp_statistics_use_fkeys Value : on > GUC : gp_subtrans_warn_limit Value : 16777216 > GUC : gp_temporary_directory_mark_error Value : 0 > GUC : gp_udp_bufsize_k Value : 0 > GUC : gp_vmem_protect_segworker_cache_limit Value : 500 > GUC : gp_workfile_checksumming Value : on > GUC : gp_workfile_compress_algorithm Value : none > GUC : gp_workfile_limit_per_query Value : 0 > GUC : gp_workfile_limit_per_segment Value : 0 > GUC : gpperfmon_port Value : 8888 > GUC : hawq_global_rm_type Value : yarn > GUC : hawq_master_address_host Value : master.bigdata > GUC : hawq_master_address_port Value : 6432 > GUC : hawq_master_temp_directory Value : /tmp > GUC : hawq_metadata_cache_block_capacity Value : 2097152 > GUC : hawq_metadata_cache_check_interval Value : 30 > GUC : hawq_metadata_cache_flush_ratio Value : 0.85 > GUC : hawq_metadata_cache_free_block_max_ratio Value : 0.05 > GUC : hawq_metadata_cache_free_block_normal_ratio Value : 0.2 > GUC : hawq_metadata_cache_max_hdfs_file_num Value : 524288 > GUC : hawq_metadata_cache_reduce_ratio Value : 0.7 > GUC : hawq_metadata_cache_refresh_interval Value : 3600 > GUC : hawq_metadata_cache_refresh_max_num Value : 1000 > GUC : hawq_metadata_cache_refresh_timeout Value : 3600 > GUC : hawq_re_cgroup_hierarchy_name Value : hadoop-yarn > GUC : hawq_re_cgroup_mount_point Value : /sys/fs/cgroup > GUC : hawq_re_cpu_enable Value : off > GUC : hawq_re_cpu_weight Value : 1024 > GUC : hawq_re_memory_overcommit_max Value : 8192 > GUC : hawq_re_vcore_pcore_ratio Value : 1 > GUC : hawq_rm_cluster_report_period Value : 60 > GUC : hawq_rm_clusterratio_core_to_memorygb_factor Value : 5 > GUC : hawq_rm_connpool_sameaddr_buffersize Value : 2 > GUC : hawq_rm_container_batch_limit Value : 1000 > GUC : hawq_rm_enable_connpool Value : on > GUC : hawq_rm_force_alterqueue_cancel_queued_request Value : on > GUC : hawq_rm_force_fifo_queuing Value : on > GUC : hawq_rm_master_domain_port Value : 5436 > GUC : hawq_rm_master_port Value : 5437 > GUC : hawq_rm_memory_limit_perseg Value : 480GB > GUC : hawq_rm_min_resource_perseg Value : 2 > GUC : hawq_rm_nocluster_timeout Value : 60 > GUC : hawq_rm_nresqueue_limit Value : 128 > GUC : hawq_rm_nslice_perseg_limit Value : 5000 > GUC : hawq_rm_nvcore_limit_perseg Value : 16 > GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_limit Value : 512 > GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_perseg_limit Value : 8 > GUC : hawq_rm_nvseg_for_analyze_part_perquery_limit Value : 256 > GUC : hawq_rm_nvseg_for_analyze_part_perquery_perseg_limit Value : 4 > GUC : hawq_rm_nvseg_for_copy_from_perquery Value : 6 > GUC : hawq_rm_nvseg_perquery_limit Value : 512 > GUC : hawq_rm_nvseg_perquery_perseg_limit Value : 6 > GUC : hawq_rm_nvseg_variance_amon_seg_limit Value : 1 > GUC : hawq_rm_nvseg_variance_amon_seg_respool_limit Value : 2 > GUC : hawq_rm_regularize_io_factor Value : 1 > GUC : hawq_rm_regularize_io_max Value : 1.37439e+11 > GUC : hawq_rm_regularize_nvseg_factor Value : 1 > GUC : hawq_rm_regularize_nvseg_max Value : 300 > GUC : hawq_rm_regularize_usage_factor Value : 1 > GUC : hawq_rm_rejectrequest_nseg_limit Value : 0.25 > GUC : hawq_rm_request_timeoutcheck_interval Value : 1 > GUC : hawq_rm_resource_allocation_timeout Value : 600 > GUC : hawq_rm_resource_idle_timeout Value : 300 > GUC : hawq_rm_respool_test_file Value : > GUC : hawq_rm_return_percent_on_overcommit Value : 10 > GUC : hawq_rm_segment_config_refresh_interval Value : 30 > GUC : hawq_rm_segment_heartbeat_interval Value : 30 > GUC : hawq_rm_segment_heartbeat_timeout Value : 300 > GUC : hawq_rm_segment_port Value : 5438 > GUC : hawq_rm_segment_tmpdir_detect_interval Value : 300 > GUC : hawq_rm_session_lease_heartbeat_enable Value : on > GUC : hawq_rm_session_lease_heartbeat_interval Value : 10 > GUC : hawq_rm_session_lease_timeout Value : 180 > GUC : hawq_rm_stmt_nvseg Value : 0 > GUC : hawq_rm_stmt_vseg_memory Value : 128mb > GUC : hawq_rm_tolerate_nseg_limit Value : 0.25 > GUC : hawq_rm_yarn_address Value : worker1.bigdata:8050 > GUC : hawq_rm_yarn_app_name Value : hawq > GUC : hawq_rm_yarn_queue_name Value : default > GUC : hawq_rm_yarn_scheduler_address Value : worker1.bigdata:8030 > GUC : hawq_segment_address_port Value : 40000 > GUC : hawq_segment_history_keep_period Value : 365 > GUC : hawq_segment_temp_directory Value : /tmp > GUC : integer_datetimes Value : on > GUC : IntervalStyle Value : postgres > GUC : join_collapse_limit Value : 20 > GUC : krb5_ccname Value : /tmp/postgres.ccname > GUC : krb_caseins_users Value : off > GUC : krb_server_keyfile Value : > FILE:/data/pulse2-agent/agents/agent1/work/HAWQ-main-opt/rhel5_x86_64/src/hawq-db-dist/etc/krb5.keytab > GUC : krb_srvname Value : postgres > GUC : lc_collate Value : C > GUC : lc_ctype Value : C > GUC : lc_messages Value : en_US.utf8 > GUC : lc_monetary Value : en_US.utf8 > GUC : lc_numeric Value : en_US.utf8 > GUC : lc_time Value : en_US.utf8 > GUC : listen_addresses Value : * > GUC : local_preload_libraries Value : > GUC : log_autostats Value : off > GUC : log_connections Value : off > GUC : log_disconnections Value : off > GUC : log_dispatch_stats Value : off > GUC : log_duration Value : off > GUC : log_error_verbosity Value : default > GUC : log_executor_stats Value : off > GUC : log_hostname Value : off > GUC : log_min_duration_statement Value : -1 > GUC : log_min_error_statement Value : error > GUC : log_min_messages Value : warning > GUC : log_parser_stats Value : off > GUC : log_planner_stats Value : off > GUC : log_rotation_age Value : 1440 > GUC : log_rotation_size Value : 0 > GUC : log_statement Value : none > GUC : log_statement_stats Value : off > GUC : log_timezone Value : PRC > GUC : log_truncate_on_rotation Value : off > GUC : maintenance_work_mem Value : 65536 > GUC : master_directory Value : > GUC : max_appendonly_segfiles Value : 262144 > GUC : max_appendonly_tables Value : 10000 > GUC : max_connections Value : 1280 > GUC : max_files_per_process Value : 150 > GUC : max_fsm_pages Value : 200000 > GUC : max_fsm_relations Value : 1000 > GUC : max_function_args Value : 100 > GUC : max_identifier_length Value : 63 > GUC : max_index_keys Value : 32 > GUC : max_locks_per_transaction Value : 128 > GUC : max_prepared_transactions Value : 250 > GUC : max_stack_depth Value : 2048 > GUC : max_work_mem Value : 1024000 > GUC : metadata_cache_testfile Value : > GUC : optimizer Value : on > GUC : optimizer_analyze_root_partition Value : on > GUC : optimizer_minidump Value : onerror > GUC : optimizer_parts_to_force_sort_on_insert Value : 160 > GUC : password_encryption Value : on > GUC : password_hash_algorithm Value : MD5 > GUC : pljava_classpath Value : > GUC : pljava_release_lingering_savepoints Value : off > GUC : pljava_statement_cache_size Value : 0 > GUC : pljava_vmoptions Value : > GUC : port Value : 6432 > GUC : pxf_enable_filter_pushdown Value : on > GUC : pxf_enable_locality_optimizations Value : on > GUC : pxf_enable_stat_collection Value : on > GUC : pxf_remote_service_login Value : > GUC : pxf_remote_service_secret Value : > GUC : pxf_service_address Value : localhost:51200 > GUC : pxf_stat_max_fragments Value : 100 > GUC : random_page_cost Value : 100 > GUC : regex_flavor Value : advanced > GUC : runaway_detector_activation_percent Value : 95 > GUC : search_path Value : "$user",public > GUC : seg_max_connections Value : 3000 > GUC : segment_directory Value : > GUC : seq_page_cost Value : 1 > GUC : server_encoding Value : UTF8 > GUC : server_ticket_renew_interval Value : 43200000 > GUC : server_version Value : 8.2.15 > GUC : server_version_num Value : 80215 > GUC : shared_buffers Value : 4000 > GUC : shared_preload_libraries Value : > GUC : ssl Value : off > GUC : ssl_ciphers Value : ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH > GUC : standard_conforming_strings Value : off > GUC : standby_address_host Value : localhost > GUC : statement_timeout Value : 0 > GUC : superuser_reserved_connections Value : 3 > GUC : tcp_keepalives_count Value : 9 > GUC : tcp_keepalives_idle Value : 7200 > GUC : tcp_keepalives_interval Value : 75 > GUC : temp_buffers Value : 1024 > GUC : TimeZone Value : PRC > GUC : timezone_abbreviations Value : Default > GUC : track_activities Value : on > GUC : track_counts Value : off > GUC : transaction_isolation Value : read committed > GUC : transaction_read_only Value : off > GUC : transform_null_equals Value : off > GUC : unix_socket_directory Value : > GUC : unix_socket_group Value : > GUC : unix_socket_permissions Value : 511 > GUC : update_process_title Value : on > GUC : vacuum_cost_delay Value : 0 > GUC : vacuum_cost_limit Value : 200 > GUC : vacuum_cost_page_dirty Value : 20 > GUC : vacuum_cost_page_miss Value : 10 > GUC : vacuum_freeze_min_age Value : 100000000 > GUC : work_mem Value : 51200 > > > > > > At 2016-09-21 13:00:41, "Vineet Goel" <[email protected]> wrote: > > Could you please post your SQL DDL statement? How many URLs do you have in > your external table? Also, your HASH dist table - how many buckets are > defined, if any? Are the # of URLs more than the # of buckets or > default_hash_table_bucket_number value? Perhaps you can attach your > hawq-site.xml file as well. > > Also see: > http://hdb.docs.pivotal.io/20/datamgmt/load/g-gpfdist-protocol.html > > Thanks > Vineet > > > On Tue, Sep 20, 2016 at 7:07 PM 来熊 <[email protected]> wrote: > >> Hi,all: >> I am testing hawq 2.0.0 , and I find a problem like this: >> I load data from an external table (created using "like target_table" >> statement) , >> if the target table was distributed by some column(s), it raise this >> error: >> External scan error: There are more external files (URLs) than primary >> segments that can read them (COptTasks.cpp:1756) >> if the target table was distributed randomly, it works well, >> I don't set any parameter special,does anybody know how to resolve this >> problem? >> thanks a lot. >> >
