Hi all, Here is my second evaluation report, together with a simple program that I was able to compile with my parallel version of GCC. Keep in mind that I still have lots of concurrent issues inside the compiler and therefore my branch will fail to compile pretty much anything else.
To reproduce my current branch, use the following steps: 1-) Clone https://gitlab.com/flusp/gcc/tree/giulianob_parallel 2-) Edit gcc/graphunit.c's variable `num_threads` to 1. 3-) Compile with --disable-bootstrap --enable-languages=c 4-) make 5-) Edit gcc/graphunit.c's variable `num_threads` to 2, for instance. 6-) make install DESTDIR="somewhere_else_that_doesnt_break_your_gcc" 7-) compile the program using -O2 I a attaching my report in markdown format, which you can convert to pdf using `pandoc` if you find it difficult to read in the current format. I am also open to suggestions. Please do not hesitate to comment :) Thank you, Giuliano.
#include <stdio.h> #include <math.h> #include <stdlib.h> #include <time.h> #include <stdint.h> #include <time.h> #include <sys/time.h> #define MAX 1000 double //__attribute__ ((noinline, optimize("-O0"))) sinatan2(double x, double y) { return sin(atan2(x, y)); } #define EPS 1e-160 #define HUGE 1e150 #define INF 1/0. double //__attribute__ ((noinline)) sinatan2_(double x, double y) { static double hue; //if (!(*((uint64_t*) &x) & 0x7FFFFFFFFFFFFFFFULL)) if (fpclassify(x) == FP_ZERO) return x; else { if (y == 0 || x > INF || x < -INF) return copysign(1., x); else { double x_abs = fabs(x); double y_abs = fabs(y); if (x_abs < EPS || y_abs < EPS) { double alpha = 1/EPS; x = alpha*x; y = alpha*y; } if (x_abs > HUGE || y_abs > HUGE) { double alpha = 1/HUGE; x = alpha*x; y = alpha*y; } } printf("x = %le, sqrt = %le\n", x, sqrt(x*x + y*y)); return x / sqrt(x*x + y*y); } } static void generate_random_array(int n, double a[]) { int i; for (i = 0; i < n; ++i) a[i] = (double) rand() / (double) RAND_MAX; } int main(int argc, char* argv[]) { //struct timespec start, end; unsigned long long delta_t; volatile double a; int i, j; double x, y, z1, z2; if (argc != 3) return 1; x = atof(argv[1]); y = atof(argv[2]); z1 = sinatan2(x, y); z2 = sinatan2_(x, y); printf("sin(atan2(x, y)) = %le\n", z1); printf("x / sqrt(x² + y²) = %le\n", z2); /* clock_gettime(CLOCK_MONOTONIC_RAW, &start); clock_gettime(CLOCK_MONOTONIC_RAW, &end); clock_gettime(CLOCK_MONOTONIC_RAW, &start); for (i = 0; i < MAX; ++i) { for (j = 0; j < MAX; ++j) { a = sinatan2_(A[i], A[j]); } } clock_gettime(CLOCK_MONOTONIC_RAW, &end); delta_t = end.tv_nsec - start.tv_nsec; printf("time x/sqrt(x*x + y*y) = %llu\n", delta_t); clock_gettime(CLOCK_MONOTONIC_RAW, &start); for (i = 0; i < MAX; ++i) { for (j = 0; j < MAX; ++j) { a = sinatan2(A[i], A[j]); } } clock_gettime(CLOCK_MONOTONIC_RAW, &end); delta_t = (end.tv_nsec - start.tv_nsec); printf("time sin(atan2(x, y)) = %llu\n", delta_t); */ return 0; }
# Parallelize GCC with Threads: Second Evaluation Report ## Tasks accomplished until the Second Evaluation: These are the tasks which I accomplished until now: - Keep on mapping the GIMPLE global variables. - Compile a small program in parallel. Although I aimed to have a correct version of the parallel GCC by this evaluation, I couldn't archive this result yet. Here I explain some technical problems that I faced during this phase. ## Chapter 1 -- Global Variables In this chapter, I discuss some global variables that I found while debugging my parallel prototype. By using the "algorithm" that I stated in the first evaluation and sometimes using helgrind, or aided by gdb with the crash point, I was able to find some more global variables. See the table in Appendex for more details. There is still lots of variables that I have not reached yet, and some that fell through the cracks that I only find references to these after debugging the crash. ## Chapter 2 -- Garbage Collector (GGC) I inserted a single mutex in GGC to ensure that memory allocation is done sequentially, therefore avoiding me to rewrite the GGC with threading support for now. Of course, this will require a better strategy in the future, as doing memory allocation sequentially might impact performance negatively. ## Chapter 3 -- Object Pools Some passes use object pools to store objects, avoiding to allocate and deallocate memory frequently. My previous approach was to have a threadsafe memory pool that locked a mutex each time a new object is requested or released to this pool. This turned out to be a bad approach, as (1) it serialized the requests to this pool, and (2) there were some race conditions that released all objects of the pool while another thread was still holding pointers to objects of this pool, causing memory faults. My current approach is simpler but required some code refactoring. For each thread, at ::execute time, I allocate a new memory pool, and try to release it after the pass is done. When it isn't possible, I initialize the thread pool pointer with NULL, and lazily initialize it. Unfortunately, this last case causes a memory leak, but it is under control. ## Chapter 4 -- Memory Pools In memory-block.h and memory-pool.cc, we have a memory pool which is used by object pools. This again had some race conditions and required locking when allocating and deallocating objects. The trivial solution was to use a mutex, however, this drove the compiler to be unusably slow. Therefore I used a distributed approach: instead of having a singleton memory pool for the entire program, now we have a singleton memory pool for each thread. If necessary, then I will implement a merge feature for distinct memory pools. ## Chapter 5 -- Helgrind After the commit adding the per-thread memory pool, helgrind always crashes cc1 when analyzing GCC. There is no error message other than SEGFAULT, and attaching GDB to it wasn't helpful. ## Chapter 6 -- Compiling Program I am attaching a small program which I managed to get it working on my branch. I have not measured speedups yet, and I won't be surprised if there is some part of the code generated by the compiler is wrong. ## Appendex -- Global Variable Table | file name | variable | |----------------------------|-----------------------------------------------| | bitmap.c | bitmap_default_obstack | | bitmap.c | bitmap_default_obstack_depth | | | | | gimple-match-head.c | depth (several `static` occurences | | | | | tree-ssa-sccvn.c | last_vuse_ptr | | tree-ssa-sccvn.c | vn_walk_kind | | tree-ssa-sccvn.c | default_vn_walk_kind | | tree-ssa-sccvn.c | constant_to_vaule_id | | tree-ssa-sccvn.c | constant_value_ids | | tree-ssa-sccvn.c | vn_tables_obstack | | tree-ssa-sccvn.c | vn_tables_insert_obstack | | tree-ssa-sccvn.c | shared_lookup_references | | tree-ssa-sccvn.c | last_inserted_ref | | tree-ssa-sccvn.c | last_inserted_nary | | tree-ssa-sccvn.c | last_inserted_phi | | tree-ssa-sccvn.c | valid_info | | tree-ssa-sccvn.c | next_value_id | | tree-ssa-sccvn.c | vn_ssa_aux_hash | | tree-ssa-sccvn.c | vn_ssa_aux_obstack | | tree-ssa-sccvn.c | rpo_avail | | tree-ssa-sccvn.c | vn_context_bb | | tree-ssa-sccvn.c | vn_valueize | | | | | tree-ssa-loop-ivcannon.c | loops_to_unloop | | tree-ssa-loop-ivcannon.c | loops_to_unloop_nunroll | | tree-ssa-loop-ivcannon.c | edges_to_remove | | tree-ssa-loop-ivcannon.c | peeled_loops | | | | | tree-scalar-evolution.c | nb_set_scev | | tree-scalar-evolution.c | nb_get_scev | | tree-scalar-evolution.c | chrec_not_analyzed_yet | | | | | tree-scalar-evolution.c | scalar_evolution_info | | | | | tree-ssa-threadedge.c | stmt_count | | tree-ssa-threadedge.c | ssa_name_values | | | | | tree-vrp.c | live | | tree-vrp.c | need_assert_for | | tree-vrp.c | asserts_for | | | | | tree-ssa-copy.c | copy_of | | tree-ssa-copy.c | n_copy_of | | | | | tree-ssa-dce.c | worklist | | tree-ssa-dce.c | processed | | tree-ssa-dce.c | last_stmt_necessary | | tree-ssa-dce.c | bb_contains_live_stmts | | tree-ssa-dce.c | cd | | tree-ssa-dce.c | visited_control_parents | | tree-ssa-dce.c | cfg_altered | | tree-ssa-dce.c | bb_postorder | | tree-ssa-dce.c | visited | | tree-ssa-dce.c | longest_chain | | tree-ssa-dce.c | total_chain | | tree-ssa-dce.c | nr_walks | | tree-ssa-dce.c | chain_ovfl | | | | | tree-ssa-dom.c | cfg_altered | | tree-ssa-dom.c | need_eh_cleanup | | tree-ssa-dom.c | need_noreturn_fixup | | tree-ssa-dom.c | x_vr_values | | tree-ssa-dom.c | opt_stats | | | | | tree-sra.c | access_pool | | tree-sra.c | assign_link_pool | | tree-sra.c | base_access_vec | | tree-sra.c | candidate_bitmap | | tree-sra.c | candidates | | tree-sra.c | should_scalarize_away_bitmap | | tree-sra.c | cannot_scalarize_away_bitmap | | tree-sra.c | disqualified_constants | | tree-sra.c | name_obstacks | | tree-sra.c | work_queue_head | | tree-sra.c | func_param_count | | tree-sra.c | encountered_apply_args | | tree-sra.c | encountered_recursive_call | | tree-sra.c | encountered_unchargable_recursive_call | | tree-sra.c | bb_dereferences | | tree-sra.c | final_bbs | | | | | cfg.c | bb_original | | cfg.c | bb_copy | | cfg.c | loop_copy | | cfg.c | original_copy_bb_pool | | cfg.c | block_aux_obstack | | cfg.c | first_block_aux_obj | | cfg.c | edge_aux_obstack | | cfg.c | first_edge_aux_obj | | cfg.c | initialized (several entries in some function)| | | | | gimple-ssa-isolate-paths.c | cfg_altered | | | | | tree-ssa-dse.c | need_eh_cleanup | | tree-ssa-dse.c | x_vr_values | | | | | domwalk.c | bb_postorder | | | | | tree-ssa-reasso.c | reassoc_insert_powi_p | | tree-ssa-reasso.c | operand_entry_pool | | tree-ssa-reasso.c | next_operand_entry_id; | | tree-ssa-reasso.c | bb_rank | | tree-ssa-reasso.c | operand_rank | | tree-ssa-reasso.c | reassoc_branch_fixups | | tree-ssa-reasso.c | plus_negates | | tree-ssa-reasso.c | cvec | | tree-ssa-reasso.c | repeat_factor_vec | | | | | tree-ssa.c | edge_var_maps | | | | | tree-into-ssa.c | block_defs_stack | | tree-into-ssa.c | old_ssa_names | | tree-into-ssa.c | new_ssa_names | | tree-into-ssa.c | interesting_blocks | | tree-into-ssa.c | names_to_release | | tree-into-ssa.c | phis_to_rewrite | | tree-into-ssa.c | blocks_with_phis_to_rewrite | | tree-into-ssa.c | update_ssa_initialized_fn | | tree-into-ssa.c | var_infos | | tree-into-ssa.c | info_for_ssa_name | | tree-into-ssa.c | current_info_for_ssa_name_age | | tree-into-ssa.c | update_ssa_obstack | | tree-into-ssa.c | blocks_to_update | | tree-into-ssa.c | symbols_to_rename_set | | tree-into-ssa.c | symbols_to_rename | | tree-into-ssa.c | names_to_release | | | | | tree-ssa-strlen.c | strinfo_pool | | tree-ssa-strlen.c | stridx_to_strinfo | | tree-ssa-strlen.c | decl_to_strdxlist_htab | | tree-ssa-strlen.c | strlen_to_stridx | | tree-ssa-strlen.c | stridx_obstack | | | | | tree-ssa-threadupdate.c | removed_edges | | tree-ssa-threadupdate.c | paths | | tree-ssa-threadupdate.c | thread_stats | | tree-ssa-threadupdate.c | redirection_data | | tree-ssa-threadupdate.c | dbds_ce_stop | | | | | tree-object-size.c | object_sizes | | tree-object-size.c | computed | | tree-object-size.c | offset_limit | | | | | tree-ssa-structalias.c | fake_var_decl_obstack | | tree-ssa-structalias.c | variable_info_pool | | tree-ssa-structalias.c | constraint_pool | | | | | gimple-ssa-strength-red... | cand_vec | | gimple-ssa-strength-red... | stmt_cand_map | | gimple-ssa-strength-red... | cand_obstack | | gimple-ssa-strength-red... | chain_obstack | | gimple-ssa-strength-red... | incr_vec | | gimple-ssa-strength-red... | incr_vec_len | | gimple-ssa-strength-red... | address_arithmetic_p | | gimple-ssa-strength-red... | base_cand_map | | gimple-ssa-strength-red... | name_expansions | | gimple-ssa-strength-red... | alt_base_map | | | | | tree-ssa-math-opts.c | reciprocal_stats | | tree-ssa-math-opts.c | occ_head | | tree-ssa-math-opts.c | occ_pool | | | | | tree-ssa-uncprop.c | val_ssa_equiv | | | | | tree-ssa-pre.c | next_expression_id | | tree-ssa-pre.c | expressions | | tree-ssa-pre.c | name_to_id | | tree-ssa-pre.c | do_partial_partial | | tree-ssa-pre.c | phi_translate_table | | tree-ssa-pre.c | inserted_exprs | | tree-ssa-pre.c | value_expressions | | tree-ssa-pre.c | bitmap_set_pool | | tree-ssa-pre.c | grand_bitmap_obstack | | | | | tree-tailcall.c | m_acc | | tree-tailcall.c | a_acc | | tree-tailcall.c | live_vars | | tree-tailcall.c | live_vars_vec | | | | | gimple-ssa-sprintf.c | warn_lev | | gimple-ssa-sprintf.c | lsm_tmp_name | | gimple-ssa-sprintf.c | lsm_tmp_name_length | | | | | tree-vect-generic.c | vector_inner_type | | tree-vect-generic.c | vector_last_type | | tree-vect-generic.c | vector_last_nunits | | | | | passes.c | in_gimple_form | | | | | statistics.c | statistics_hashes | | statistics.c | nr_statistics_hashes | | | | | tree-cfg.c | eh_error_found | | | | | tree-ssa-loop-im.c | lim_aux_data_map | | tree-ssa-loop-im.c | mem_ref_obstack | | | | | cfgloop.c | mfb_reis_set | | | | | tree-ssa-tailmerge.c | same_succ_htab | | tree-ssa-tailmerge.c | same_succ_edge_flags | | tree-ssa-tailmerge.c | deleted_bbs | | tree-ssa-tailmerge.c | deleted_bb_preds | | tree-ssa-tailmerge.c | worklist | | tree-ssa-tailmerge.c | all_clusters | | tree-ssa-tailmerge.c | update_bbs | | tree-ssa-tailmerge.c | ter_bitmap_obstack | | | |