[GitHub] [doris] github-actions[bot] commented on pull request #24234: feat: add fsst encode

via GitHub Tue, 12 Sep 2023 19:59:40 -0700


github-actions[bot] commented on PR #24234:
URL: https://github.com/apache/doris/pull/24234#issuecomment-1716869242


   #### `sh-checker report`
   
   To get the full details, please check in the 
[job]("https://github.com/apache/doris/actions/runs/6167411582";) output.
   
   <details>
   <summary>shellcheck errors</summary>
   
   ```
   
   'shellcheck ' returned error 1 finding the following syntactical issues:
   
   ----------
   
   In be/src/fsst/paper/compare.sh line 4:
     fgrep $i $1 | fgrep -v ${i}2 | fgrep -v ${i}pedia | awk '{ printf "% 16s   
%1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, 
$6}'
     ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F 
instead.
           ^-- SC2248 (style): Prefer double quoting even when variables don't 
contain special characters.
           ^-- SC2250 (style): Prefer putting braces around variable references 
even when not strictly required.
              ^-- SC2086 (info): Double quote to prevent globbing and word 
splitting.
                   ^---^ SC2197 (info): fgrep is non-standard and deprecated. 
Use grep -F instead.
                            ^--^ SC2248 (style): Prefer double quoting even 
when variables don't contain special characters.
                                    ^---^ SC2197 (info): fgrep is non-standard 
and deprecated. Use grep -F instead.
                                             ^--^ SC2248 (style): Prefer double 
quoting even when variables don't contain special characters.
   
   Did you mean: 
     fgrep "${i}" "$1" | fgrep -v "${i}"2 | fgrep -v "${i}"pedia | awk '{ 
printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, 
$2, $8, $3, $11, $6}'
   
   
   In be/src/fsst/paper/evolution.sh line 7:
   (for i in dbtext/*; do (./cw-strncmp $i 2>&1) | awk '{ l++; if (l==3) t=$2; 
if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk 
'{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " 
iterative|suffix-array|dynp-matching|strncmp|scalar" }'
                                        ^-- SC2086 (info): Double quote to 
prevent globbing and word splitting.
                                        ^-- SC2250 (style): Prefer putting 
braces around variable references even when not strictly required.
   
   Did you mean: 
   (for i in dbtext/*; do (./cw-strncmp "${i}" 2>&1) | awk '{ l++; if (l==3) 
t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk 
'{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " 
iterative|suffix-array|dynp-matching|strncmp|scalar" }'
   
   
   In be/src/fsst/paper/evolution.sh line 8:
   (for i in dbtext/*; do (./cw $i 2>&1) | awk '{ l++; if (l==3) t=$2; if 
(l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk 
'{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " 
iterative|suffix-array|dynp-matching|str-as-long|scalar"}'
                                ^-- SC2086 (info): Double quote to prevent 
globbing and word splitting.
                                ^-- SC2250 (style): Prefer putting braces 
around variable references even when not strictly required.
   
   Did you mean: 
   (for i in dbtext/*; do (./cw "${i}" 2>&1) | awk '{ l++; if (l==3) t=$2; if 
(l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk 
'{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " 
iterative|suffix-array|dynp-matching|str-as-long|scalar"}'
   
   
   In be/src/fsst/paper/evolution.sh line 9:
   (for i in dbtext/*; do (./cw-greedy $i 2>&1) | awk '{ l++; if (l==3) t=$2; 
if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk 
'{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " 
iterative|suffix-array|greedy-match|str-as-long|scalar" }'
                                       ^-- SC2086 (info): Double quote to 
prevent globbing and word splitting.
                                       ^-- SC2250 (style): Prefer putting 
braces around variable references even when not strictly required.
   
   Did you mean: 
   (for i in dbtext/*; do (./cw-greedy "${i}" 2>&1) | awk '{ l++; if (l==3) 
t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk 
'{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " 
iterative|suffix-array|greedy-match|str-as-long|scalar" }'
   
   
   In be/src/fsst/paper/evolution.sh line 10:
   (for i in dbtext/*; do (./vcw $i 2>&1) | fgrep -v target | awk '{ l++; if 
(l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk 
'{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " 
bottom-up|binary-search|greedy-match|str-as-long|scalar" }'
                                 ^-- SC2086 (info): Double quote to prevent 
globbing and word splitting.
                                 ^-- SC2250 (style): Prefer putting braces 
around variable references even when not strictly required.
                                            ^---^ SC2197 (info): fgrep is 
non-standard and deprecated. Use grep -F instead.
   
   Did you mean: 
   (for i in dbtext/*; do (./vcw "${i}" 2>&1) | fgrep -v target | awk '{ l++; 
if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk 
'{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " 
bottom-up|binary-search|greedy-match|str-as-long|scalar" }'
   
   
   In be/src/fsst/paper/evolution.sh line 11:
   (for i in dbtext/*; do (./hcw $i 511 -adaptive 2>&1) | fgrep -v target | awk 
'{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | 
awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " 
bottom-up|lossy-hash|greedy-match|str-as-long|branch-scalar" }'
                                 ^-- SC2086 (info): Double quote to prevent 
globbing and word splitting.
                                 ^-- SC2250 (style): Prefer putting braces 
around variable references even when not strictly required.
                                                          ^---^ SC2197 (info): 
fgrep is non-standard and deprecated. Use grep -F instead.
   
   Did you mean: 
   (for i in dbtext/*; do (./hcw "${i}" 511 -adaptive 2>&1) | fgrep -v target | 
awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; 
done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " 
bottom-up|lossy-hash|greedy-match|str-as-long|branch-scalar" }'
   
   
   In be/src/fsst/paper/evolution.sh line 13:
   (for i in dbtext/*; do (./hcw-opt $i 511 -adaptive 2>&1) | fgrep -v target | 
awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; 
done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " 
bottom-up|lossy-hash|greedy-match|str-as-long|adaptive-scalar|optimized-construction"
 }'
                                     ^-- SC2086 (info): Double quote to prevent 
globbing and word splitting.
                                     ^-- SC2250 (style): Prefer putting braces 
around variable references even when not strictly required.
                                                              ^---^ SC2197 
(info): fgrep is non-standard and deprecated. Use grep -F instead.
   
   Did you mean: 
   (for i in dbtext/*; do (./hcw-opt "${i}" 511 -adaptive 2>&1) | fgrep -v 
target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " 
d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " 
bottom-up|lossy-hash|greedy-match|str-as-long|adaptive-scalar|optimized-construction"
 }'
   
   
   In be/src/fsst/paper/evolution.sh line 14:
   (for i in dbtext/*; do (./hcw-opt $i 2>&1) | fgrep -v target | awk '{ l++; 
if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk 
'{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " 
bottom-up|lossy-hash|greedy-match|str-as-long|avx512|optimized-construction" }'
                                     ^-- SC2086 (info): Double quote to prevent 
globbing and word splitting.
                                     ^-- SC2250 (style): Prefer putting braces 
around variable references even when not strictly required.
                                                ^---^ SC2197 (info): fgrep is 
non-standard and deprecated. Use grep -F instead.
   
   Did you mean: 
   (for i in dbtext/*; do (./hcw-opt "${i}" 2>&1) | fgrep -v target | awk '{ 
l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | 
awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " 
bottom-up|lossy-hash|greedy-match|str-as-long|avx512|optimized-construction" }'
   
   
   In be/src/fsst/paper/kernels.sh line 1:
   #/bin/bash
    ^-- SC1113 (error): Use #!, not just #, for the shebang.
   
   
   In be/src/fsst/paper/kernels.sh line 4:
   echo $PARAMS | awk "{for(i=1;i<=NF;i++) printf 
\"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'
        ^-----^ SC2086 (info): Double quote to prevent globbing and word 
splitting.
        ^-----^ SC2250 (style): Prefer putting braces around variable 
references even when not strictly required.
   
   Did you mean: 
   echo "${PARAMS}" | awk "{for(i=1;i<=NF;i++) printf 
\"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'
   
   
   In be/src/fsst/paper/kernels.sh line 5:
   echo "\\\\"
        ^----^ SC2028 (info): echo may not expand escape sequences. Use printf.
   
   
   In be/src/fsst/paper/kernels.sh line 10:
      for m in $PARAMS
               ^-----^ SC2250 (style): Prefer putting braces around variable 
references even when not strictly required.
   
   Did you mean: 
      for m in ${PARAMS}
   
   
   In be/src/fsst/paper/kernels.sh line 12:
        (./hcw-opt dbtext/$i 511 -$m 2>&1) | tail -2 | head -1 | awk '{ printf 
"%f ", $2 }'
                          ^-- SC2248 (style): Prefer double quoting even when 
variables don't contain special characters.
                          ^-- SC2250 (style): Prefer putting braces around 
variable references even when not strictly required.
                                  ^-- SC2086 (info): Double quote to prevent 
globbing and word splitting.
                                  ^-- SC2250 (style): Prefer putting braces 
around variable references even when not strictly required.
   
   Did you mean: 
        (./hcw-opt dbtext/"${i}" 511 -"${m}" 2>&1) | tail -2 | head -1 | awk '{ 
printf "%f ", $2 }'
   
   
   In be/src/fsst/paper/kernels.sh line 14:
      echo $i
           ^-- SC2248 (style): Prefer double quoting even when variables don't 
contain special characters.
           ^-- SC2250 (style): Prefer putting braces around variable references 
even when not strictly required.
   
   Did you mean: 
      echo "${i}"
   
   
   In be/src/fsst/paper/lz4-smallblocks.sh line 3:
   dd if=$1 of=tmpsplit.out bs=$maxsize count=1 2> /dev/null
         ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                               ^------^ SC2248 (style): Prefer double quoting 
even when variables don't contain special characters.
                               ^------^ SC2250 (style): Prefer putting braces 
around variable references even when not strictly required.
   
   Did you mean: 
   dd if="$1" of=tmpsplit.out bs="${maxsize}" count=1 2> /dev/null
   
   
   In be/src/fsst/paper/lz4-smallblocks.sh line 5:
       mkdir tmpsplit$blocksize
                     ^--------^ SC2086 (info): Double quote to prevent globbing 
and word splitting.
                     ^--------^ SC2250 (style): Prefer putting braces around 
variable references even when not strictly required.
   
   Did you mean: 
       mkdir tmpsplit"${blocksize}"
   
   
   In be/src/fsst/paper/lz4-smallblocks.sh line 6:
       split -b $blocksize tmpsplit.out tmpsplit$blocksize/x
                ^--------^ SC2086 (info): Double quote to prevent globbing and 
word splitting.
                ^--------^ SC2250 (style): Prefer putting braces around 
variable references even when not strictly required.
                                                ^--------^ SC2086 (info): 
Double quote to prevent globbing and word splitting.
                                                ^--------^ SC2250 (style): 
Prefer putting braces around variable references even when not strictly 
required.
   
   Did you mean: 
       split -b "${blocksize}" tmpsplit.out tmpsplit"${blocksize}"/x
   
   
   In be/src/fsst/paper/lz4-smallblocks.sh line 7:
       echo -n $blocksize ""
               ^--------^ SC2086 (info): Double quote to prevent globbing and 
word splitting.
               ^--------^ SC2250 (style): Prefer putting braces around variable 
references even when not strictly required.
   
   Did you mean: 
       echo -n "${blocksize}" ""
   
   
   In be/src/fsst/paper/lz4-smallblocks.sh line 8:
       size=$((for f in tmpsplit$blocksize/x*; do lz4 -c $f | wc -c; done) | 
awk '{s+=$1} END {print s}')
            ^-- SC1102 (error): Shells disambiguate $(( differently or not at 
all. For $(command substitution), add space after $( . For $((arithmetics)), 
fix parsing errors.
                                ^--------^ SC2231 (info): Quote expansions in 
this for loop glob to prevent wordsplitting, e.g. "$dir"/*.txt .
                                ^--------^ SC2250 (style): Prefer putting 
braces around variable references even when not strictly required.
                                                         ^-- SC2086 (info): 
Double quote to prevent globbing and word splitting.
                                                         ^-- SC2250 (style): 
Prefer putting braces around variable references even when not strictly 
required.
   
   Did you mean: 
       size=$((for f in tmpsplit${blocksize}/x*; do lz4 -c "${f}" | wc -c; 
done) | awk '{s+=$1} END {print s}')
   
   
   In be/src/fsst/paper/lz4-smallblocks.sh line 9:
       echo "$maxsize / $size" | bc -l
             ^------^ SC2250 (style): Prefer putting braces around variable 
references even when not strictly required.
                        ^---^ SC2250 (style): Prefer putting braces around 
variable references even when not strictly required.
   
   Did you mean: 
       echo "${maxsize} / ${size}" | bc -l
   
   
   In be/src/fsst/paper/lz4-smallblocks.sh line 10:
       rm -rf tmpsplit$blocksize/
                      ^--------^ SC2086 (info): Double quote to prevent 
globbing and word splitting.
                      ^--------^ SC2250 (style): Prefer putting braces around 
variable references even when not strictly required.
   
   Did you mean: 
       rm -rf tmpsplit"${blocksize}"/
   
   
   In be/src/fsst/paper/sorted.sh line 8:
   cd dbtext
   ^-------^ SC2164 (warning): Use 'cd ... || exit' or 'cd ... || return' in 
case cd fails.
   
   Did you mean: 
   cd dbtext || exit
   
   
   In be/src/fsst/paper/sorted.sh line 11:
     sort $i > ../.sorted/$i; 
          ^-- SC2086 (info): Double quote to prevent globbing and word 
splitting.
          ^-- SC2250 (style): Prefer putting braces around variable references 
even when not strictly required.
                          ^-- SC2086 (info): Double quote to prevent globbing 
and word splitting.
                          ^-- SC2250 (style): Prefer putting braces around 
variable references even when not strictly required.
   
   Did you mean: 
     sort "${i}" > ../.sorted/"${i}"; 
   
   
   In be/src/fsst/paper/sorted.sh line 14:
   cd ..
   ^---^ SC2103 (info): Use a ( subshell ) to avoid having to cd back.
   
   
   In be/src/fsst/paper/sorted.sh line 19:
     ./filtertest compare 1000 dbtext/$i | tail -1 | awk '{ printf "% 16s %1.2f 
%1.2f ",$1,$2,$7}'
                                      ^-- SC2248 (style): Prefer double quoting 
even when variables don't contain special characters.
                                      ^-- SC2250 (style): Prefer putting braces 
around variable references even when not strictly required.
   
   Did you mean: 
     ./filtertest compare 1000 dbtext/"${i}" | tail -1 | awk '{ printf "% 16s 
%1.2f %1.2f ",$1,$2,$7}'
   
   
   In be/src/fsst/paper/sorted.sh line 20:
     ./filtertest compare 1000 .sorted/$i | tail -1 | awk '{ printf "%1.2f 
%1.2f\n",$2,$7}'
                                       ^-- SC2248 (style): Prefer double 
quoting even when variables don't contain special characters.
                                       ^-- SC2250 (style): Prefer putting 
braces around variable references even when not strictly required.
   
   Did you mean: 
     ./filtertest compare 1000 .sorted/"${i}" | tail -1 | awk '{ printf "%1.2f 
%1.2f\n",$2,$7}'
   
   For more information:
     https://www.shellcheck.net/wiki/SC1102 -- Shells disambiguate $(( 
different...
     https://www.shellcheck.net/wiki/SC1113 -- Use #!, not just #, for the 
sheba...
     https://www.shellcheck.net/wiki/SC2164 -- Use 'cd ... || exit' or 'cd ... 
|...
   ----------
   
   You can address the above issues in one of three ways:
   1. Manually correct the issue in the offending shell script;
   2. Disable specific issues by adding the comment:
     # shellcheck disable=NNNN
   above the line that contains the issue, where NNNN is the error code;
   3. Add '-e NNNN' to the SHELLCHECK_OPTS setting in your .yml action file.
   
   
   
   ```
   </details>
   
   <details>
   <summary>shfmt errors</summary>
   
   ```
   
   'shfmt ' returned error 1 finding the following formatting issues:
   
   ----------
   --- be/src/fsst/paper/compare.sh.orig
   +++ be/src/fsst/paper/compare.sh
   @@ -1,5 +1,4 @@
    #!/bin/bash
   -(for i in hex yago email wiki uuid urls2 urls firstname lastname city 
credentials street movies faust hamlet chinese japanese wikipedia genome 
location c_name l_commen ps_comment 
   - do
   -  fgrep $i $1 | fgrep -v ${i}2 | fgrep -v ${i}pedia | awk '{ printf "% 16s  
 %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, 
$6}'
   - done) | awk '{print$0;k++;for(i=2;i<=NF;i++) r[i]+=$i;}END{printf "% 16s   
%1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", 
"AVG",r[2]/k,r[3]/k,r[4]/k,r[5]/k,r[6]/k,r[7]/k,r[8]/k}'
   +(for i in hex yago email wiki uuid urls2 urls firstname lastname city 
credentials street movies faust hamlet chinese japanese wikipedia genome 
location c_name l_commen ps_comment; do
   +    fgrep $i $1 | fgrep -v ${i}2 | fgrep -v ${i}pedia | awk '{ printf "% 
16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, 
$11, $6}'
   +done) | awk '{print$0;k++;for(i=2;i<=NF;i++) r[i]+=$i;}END{printf "% 16s   
%1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", 
"AVG",r[2]/k,r[3]/k,r[4]/k,r[5]/k,r[6]/k,r[7]/k,r[8]/k}'
   --- be/src/fsst/paper/evolution.sh.orig
   +++ be/src/fsst/paper/evolution.sh
   @@ -1,7 +1,7 @@
    #!/bin/bash
    # output format: STCB CCB CR
    # STCB: symbol table construction cost in cycles-per-compressed byte 
(constructing a new ST per 8MB text)
   -# CCB:  compression speed cycles-per-compressed byte 
   +# CCB:  compression speed cycles-per-compressed byte
    # CR:   compression (=size reduction) factor achieved
    
    (for i in dbtext/*; do (./cw-strncmp $i 2>&1) | awk '{ l++; if (l==3) t=$2; 
if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk 
'{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " 
iterative|suffix-array|dynp-matching|strncmp|scalar" }'
   @@ -16,10 +16,10 @@
    # on Intel SKX CPUs| the results look like:
    #
    # 75.117,160.11,1.97194 iterative|suffix-array|dynp-matching|strncmp|scalar
   -#   \--> 160 cycles per byte produces a very slow compression speed (say 
~20MB/s on a 3Ghz CPU) 
   +#   \--> 160 cycles per byte produces a very slow compression speed (say 
~20MB/s on a 3Ghz CPU)
    #
    # 73.6948,81.6404,1.97194 
iterative|suffix-array|dynp-matching|str-as-long|scalar
   -#   \--> str-as-long (i.e. FSST focusing on 8-byte word symbols) improves 
compression speed 2x 
   +#   \--> str-as-long (i.e. FSST focusing on 8-byte word symbols) improves 
compression speed 2x
    #
    # 74.4996,37.457,1.94764 
iterative|suffix-array|greedy-match|str-as-long|scalar
    #   \--> dynamic programming brought only 3% smaller size. So drop it and 
gain another 2x compression speed.
   @@ -28,7 +28,7 @@
    #   \--> bottom-up is *really* better in terms of compression factor than 
iterative with suffix array.
    #
    # 1.74783,10.7009,2.28103 
bottom-up|lossy-hash|greedy-match|str-as-long|scalar-branch
   -#   \--> hashing significantly improves compression speed at only 5% size 
cost (due to hash collisions) 
   +#   \--> hashing significantly improves compression speed at only 5% size 
cost (due to hash collisions)
    #
    # 1.74783,9.8142,2.28103 
bottom-up|lossy-hash|greedy-match|str-as-long|scalar-adaptive
    #   \--> adaptive use of encoding kernels gives compression speed a small 
bump
   @@ -39,4 +39,4 @@
    # optimized construction refers to the combination of three changes:
    # - reducing the amount of bottom-up passes from 10 to 5 (less learning 
time, but.. slighty worsens CR)
    # - looking at subsamples in early rounds (increasing the sample as the 
rounds go up). Less compression work.
   -# - splitting the counters for less cache pressure and aiding fast skipping 
over counts-of-0 
   +# - splitting the counters for less cache pressure and aiding fast skipping 
over counts-of-0
   --- be/src/fsst/paper/kernels.sh.orig
   +++ be/src/fsst/paper/kernels.sh
   @@ -1,15 +1,15 @@
    #/bin/bash
    PARAMS='simd1 simd2 simd3 simd4 adaptive'
   -(echo | awk '{ print "{\\begin{tabular}{|rrrr|r|l|}\n\\hline"}'
   -echo $PARAMS | awk "{for(i=1;i<=NF;i++) printf 
\"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'
   -echo "\\\\"
   -echo "\\hline"
   -echo "\\hline"
   -(for i in hex yago email wiki uuid urls2 urls firstname lastname city 
credentials street movies faust hamlet chinese japanese wikipedia genome 
location c_name l_comment ps_comment 
   - do 
   -   for m in $PARAMS
   -   do
   -     (./hcw-opt dbtext/$i 511 -$m 2>&1) | tail -2 | head -1 | awk '{ printf 
"%f ", $2 }'
   -   done
   -   echo $i
   - done) | awk '{for(i=1;i<NF;i++){r[i]+=$i;printf 
"{\\footnotesize{X%d%5.2f}}& ",i,$i}k++;printf "{\\footnotesize 
%s}\\\\\n",$NF}END{print "\\hline"; for(j=1;j<i;j++)printf 
"{\\footnotesize{X%d%5.2f}}& ",j,r[j]/k;print "{\\footnotesize 
average}\\\\\n\\hline\n\\end{tabular}}"}' | sed 's/_/\\_/g' | sed 
's/[0-9]*-//') | sed 's/X[38]/\\bf /g' | sed 's/X[1-9]//g' | sed 
's/adaptive/scalar/' 
   +(
   +    echo | awk '{ print "{\\begin{tabular}{|rrrr|r|l|}\n\\hline"}'
   +    echo $PARAMS | awk "{for(i=1;i<=NF;i++) printf 
\"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'
   +    echo "\\\\"
   +    echo "\\hline"
   +    echo "\\hline"
   +    (for i in hex yago email wiki uuid urls2 urls firstname lastname city 
credentials street movies faust hamlet chinese japanese wikipedia genome 
location c_name l_comment ps_comment; do
   +        for m in $PARAMS; do
   +            (./hcw-opt dbtext/$i 511 -$m 2>&1) | tail -2 | head -1 | awk '{ 
printf "%f ", $2 }'
   +        done
   +        echo $i
   +    done) | awk '{for(i=1;i<NF;i++){r[i]+=$i;printf 
"{\\footnotesize{X%d%5.2f}}& ",i,$i}k++;printf "{\\footnotesize 
%s}\\\\\n",$NF}END{print "\\hline"; for(j=1;j<i;j++)printf 
"{\\footnotesize{X%d%5.2f}}& ",j,r[j]/k;print "{\\footnotesize 
average}\\\\\n\\hline\n\\end{tabular}}"}' | sed 's/_/\\_/g' | sed 's/[0-9]*-//'
   +) | sed 's/X[38]/\\bf /g' | sed 's/X[1-9]//g' | sed 's/adaptive/scalar/'
   be/src/fsst/paper/lz4-smallblocks.sh:8:17: not a valid arithmetic operator: f
   --- be/src/fsst/paper/sorted.sh.orig
   +++ be/src/fsst/paper/sorted.sh
   @@ -6,17 +6,15 @@
    rm -rf .sorted 2>/dev/null
    mkdir .sorted
    cd dbtext
   -for i in * 
   -do 
   -  sort $i > ../.sorted/$i; 
   +for i in *; do
   +    sort $i >../.sorted/$i
    done
    cp chinese japanese faust hamlet ../.sorted/
    cd ..
    
    # note sizes, display stats
   -(for i in hex yago email wiki uuid urls2 urls firstname lastname city 
credentials street movies faust hamlet chinese japanese wikipedia genome 
location c_name l_comment ps_comment
   - do 
   -  ./filtertest compare 1000 dbtext/$i | tail -1 | awk '{ printf "% 16s 
%1.2f %1.2f ",$1,$2,$7}'
   -  ./filtertest compare 1000 .sorted/$i | tail -1 | awk '{ printf "%1.2f 
%1.2f\n",$2,$7}'
   - done) | 
   -awk '{ s1+=$2; s2+=$3; s3+=$4; s4+=$5; k++; print $0} END {printf "% 16s 
%1.2f% 1.2f %1.2f %1.2f\n", "avg",s1/k, s2/k, s3/k, s4/k}'
   +(for i in hex yago email wiki uuid urls2 urls firstname lastname city 
credentials street movies faust hamlet chinese japanese wikipedia genome 
location c_name l_comment ps_comment; do
   +    ./filtertest compare 1000 dbtext/$i | tail -1 | awk '{ printf "% 16s 
%1.2f %1.2f ",$1,$2,$7}'
   +    ./filtertest compare 1000 .sorted/$i | tail -1 | awk '{ printf "%1.2f 
%1.2f\n",$2,$7}'
   +done) |
   +    awk '{ s1+=$2; s2+=$3; s3+=$4; s4+=$5; k++; print $0} END {printf "% 
16s %1.2f% 1.2f %1.2f %1.2f\n", "avg",s1/k, s2/k, s3/k, s4/k}'
   ----------
   
   You can reformat the above files to meet shfmt's requirements by typing:
   
     shfmt  -w filename
   
   
   ```
   </details>
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [doris] github-actions[bot] commented on pull request #24234: feat: add fsst encode

Reply via email to