Stamatis Zampetakis created HIVE-27100:
------------------------------------------

             Summary: Remove unused data/files from repo
                 Key: HIVE-27100
                 URL: https://issues.apache.org/jira/browse/HIVE-27100
             Project: Hive
          Issue Type: Task
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


Some files under [https://github.com/apache/hive/tree/master/data/files] are 
not referenced anywhere else in the repo and can be removed.

Removing them makes it easier to see what is actually tested. Other minor 
benefits:
 * faster checkout times;
 * smaller source/binary releases.

The script that was used to find which files are not referenced can be found 
below:
{code:bash}
for f in `ls data/files`; do
  echo -n "$f "; 
  grep -a -R "$f" --exclude-dir=".git" --exclude-dir=target --exclude=\*.q.out 
--exclude=\*.class --exclude=\*.jar | wc -l | grep " 0$";
done
{code}
+Output+
{noformat}
cbo_t4.txt 0
cbo_t5.txt 0
cbo_t6.txt 0
compressed_4line_file1.csv.bz2 0
empty2.txt 0
filterCard.txt 0
fullouter_string_big_1a_old.txt 0
fullouter_string_small_1a_old.txt 0
futurama_episodes.avro 0
in9.txt 0
map_null_schema.avro 0
regex-path-2015-12-10_03.txt 0
regex-path-201512-10_03.txt 0
regex-path-2015121003.txt 0
sample.json 0
sample-queryplan-in-history.txt 0
sample-queryplan.txt 0
smbbucket_2.txt 0
smb_bucket_input.txt 0
SortDescCol1Col2.txt 0
SortDescCol2Col1.txt 0
sortdp.txt 0
srcsortbucket1outof4.txt 0
srcsortbucket2outof4.txt 0
srcsortbucket4outof4.txt 0
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to