Thomas Rebele created HIVE-29204:
------------------------------------

             Summary: Hive-site: cleanup attachments and links to attachments
                 Key: HIVE-29204
                 URL: https://issues.apache.org/jira/browse/HIVE-29204
             Project: Hive
          Issue Type: Task
            Reporter: Thomas Rebele


Some links to attachments lead to a 404 Not found, e.g. 
[attachments/40509928/42696874-txt|https://hive.apache.org/attachments/40509928/42696874-txt]
 in [SQL Standard Based Hive 
Authorization|https://hive.apache.org/docs/latest/language/sql-standard-based-hive-authorization/#hive-013].

Some link texts replace the dot with a dash (e.g., 
content/community/resources/presentations.md). In general, it would be better 
to use the title of the document instead of numbers as file name and link text.
{code:java}
50:* [attachments/27362054/35193149-pptx](/attachments/27362054/35193149.pptx) 
(Ashutosh Chauhan){code}
A few shell commands that might be helpful:
{code:java}
find themes/hive/static/attachments -type f | sed 's#themes/hive/static/##' | 
sort -u > available-attachments.txt
rg "attachments/" | sed 's#attachments/#\nattachments/#g;' | grep 
'^attachments' | sed 's/\([?"<> )]\|\]\).*//' | sort -u > needed-attachments.txt
{code}
There are also some duplicate files:
{code:java}
$ cat available-attachments.txt| sed 's#^#themes/hive/static/#' | xargs md5sum 
| sort
...
f9f26fe37b0c5276d0b63f98e1188324  
themes/hive/static/attachments/27362075/34177489.pdf
f9f26fe37b0c5276d0b63f98e1188324  
themes/hive/static/attachments/27362075/34177517.pdf
f9f26fe37b0c5276d0b63f98e1188324  
themes/hive/static/attachments/27362075/35193010.pdf
f9f26fe37b0c5276d0b63f98e1188324  
themes/hive/static/attachments/27362075/35193011.pdf
...
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to