Stamatis Zampetakis created HIVE-27100:
------------------------------------------
Summary: Remove unused data/files from repo
Key: HIVE-27100
URL: https://issues.apache.org/jira/browse/HIVE-27100
Project: Hive
Issue Type: Task
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
Some files under [https://github.com/apache/hive/tree/master/data/files] are
not referenced anywhere else in the repo and can be removed.
Removing them makes it easier to see what is actually tested. Other minor
benefits:
* faster checkout times;
* smaller source/binary releases.
The script that was used to find which files are not referenced can be found
below:
{code:bash}
for f in `ls data/files`; do
echo -n "$f ";
grep -a -R "$f" --exclude-dir=".git" --exclude-dir=target --exclude=\*.q.out
--exclude=\*.class --exclude=\*.jar | wc -l | grep " 0$";
done
{code}
+Output+
{noformat}
cbo_t4.txt 0
cbo_t5.txt 0
cbo_t6.txt 0
compressed_4line_file1.csv.bz2 0
empty2.txt 0
filterCard.txt 0
fullouter_string_big_1a_old.txt 0
fullouter_string_small_1a_old.txt 0
futurama_episodes.avro 0
in9.txt 0
map_null_schema.avro 0
regex-path-2015-12-10_03.txt 0
regex-path-201512-10_03.txt 0
regex-path-2015121003.txt 0
sample.json 0
sample-queryplan-in-history.txt 0
sample-queryplan.txt 0
smbbucket_2.txt 0
smb_bucket_input.txt 0
SortDescCol1Col2.txt 0
SortDescCol2Col1.txt 0
sortdp.txt 0
srcsortbucket1outof4.txt 0
srcsortbucket2outof4.txt 0
srcsortbucket4outof4.txt 0
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)