[ 
https://issues.apache.org/jira/browse/CAMEL-23781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089658#comment-18089658
 ] 

ASF GitHub Bot commented on CAMEL-23781:
----------------------------------------

k-krawczyk commented on PR #1666:
URL: https://github.com/apache/camel-website/pull/1666#issuecomment-4730733995

   @davsclaus good call — I measured it against the live site rather than 
guessing:
   
   - ~5,570 doc pages in the sitemap (already spanning multiple versions: 
`next`, `4.18.x`, `4.14.x`, …)
   - average `.md` ~40–86 KB (component pages with big option tables are the 
large ones)
   - real zip ratio measured on a 12-page sample: **~26.5%**
   
   So uncompressed is ~250–480 MB (that's the ~500 MB you expected), and the 
**`.zip` lands around ~70–130 MB**. Plain `zip` compresses each file 
independently, so the cross-version duplication doesn't shrink it much — a 
solid `.tar.gz` would be smaller. If the on-disk build keeps more versions than 
the sitemap exposes, it'd scale up proportionally.
   
   So I agree it's too big to commit into `public/` and redeploy on every 
change. Options:
   1. Build it only on release (or a scheduled job) and publish it as a 
**GitHub Release asset**, with `llms.txt` pointing at that URL. The project 
already consumes release binaries via the `github-release-binary` yarn plugin, 
so this fits the existing distribution model.
   2. Ship `.tar.gz` instead of `.zip` to roughly halve the size.
   3. Split into smaller per-area bundles.
   
   I'm happy to rework this PR towards (1). Which distribution mechanism do you 
prefer?
   
   _Reported by Claude Code on behalf of Karol Krawczyk_




> camel-website - Offline zip for offline coding agents
> -----------------------------------------------------
>
>                 Key: CAMEL-23781
>                 URL: https://issues.apache.org/jira/browse/CAMEL-23781
>             Project: Camel
>          Issue Type: New Feature
>          Components: camel-ai, website
>            Reporter: Claus Ibsen
>            Assignee: Karol Krawczyk
>            Priority: Major
>             Fix For: 4.x
>
>
> [https://github.com/apache/camel/pull/24063]
> Companies may have restricted their AI coding agents to not access the 
> internet, or with controlled access. But even for controlled acccess it may 
> take time for a company to approve camel.apache.org as allowed list.
> Maybe we can have a offline website .zip for AIs that has the website 
> structure and only the .md files that coding agents need. Then it can source 
> the information there, and just unzip this file on the local disk in /tmp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to