The cuda implementation is currently rotting in
algorithm/A51/implementation/cuda/kernel/bitslice.hpp

the file you mentioned is not used anywhere. but it is a bug indeed.
its not the only one in the file.

the cuda code is currently not functional in any file in the repository.

>I recently adopted the bitslice approach in my own software and found
>a useless instruction generated by a macro that affects the
>performance of the CUDA implementation.
>
>See: /algorithm/A51/implementation/common/partitioned_bitslice.hpp
>
>Revision 79, Line: 115
>
>BOOST_PP_REPEAT(23, pbs_clock_r3,);
>
>22 repetitions are enough:
>
>BOOST_PP_REPEAT(22, pbs_clock_r3,);
>
>Luckily it does not affect the correctness. Due to the fact that: "If
>the difference between x and y is less than 0, the result is saturated
>to 0." no error was produced. Instead just an unnecessary r3_0
>assignment that does not make any sense:
>
>r3_0 = r3_0 & not_clock_r3 | r3_0 & do_clock_r3;
>
>If the compiler would recognize that not_clock_r3 and do_clock_r3 are
>complements maybe this command would be omitted.
>_______________________________________________
>A51 mailing list
>A51@lists.reflextor.com
>http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
___________________________________________________________
GRATIS für alle WEB.DE Nutzer: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://movieflat.web.de
_______________________________________________
A51 mailing list
A51@lists.reflextor.com
http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51

Reply via email to