[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2024-03-19 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

Joe Orton  changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

--- Comment #17 from Joe Orton  ---
Merged to 2.4.x in r1916412

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2024-02-07 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

Joe Orton  changed:

   What|Removed |Added

 Status|NEEDINFO|RESOLVED
 Resolution|--- |FIXED

--- Comment #16 from Joe Orton  ---
Second change merged in r1915625 and will propose for backport.

Thanks to Joseph and others who've provided input here.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2024-02-07 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

--- Comment #15 from Joe Orton  ---
So in fact I should have been testing on 2.4 rather than trunk and I would not
have wasted half a day on this! :)

https://github.com/apache/httpd/commit/c57a036dc3e116f5a397bd6a97da77dd6b503a83
significantly changes the behaviour of mod_xml2enc on trunk (this patch is not
in 2.4.x) which led me wildly astray.

It is indeed simple to reproduce the content transformation suggested above. I
am still not sure how the module is "supposed" to behave, which seems quite
ill-defined.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2024-02-07 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

--- Comment #14 from Joe Orton  ---
Thanks. I think what I was missing here is that mod_proxy_html/mod_xml2enc it
not actually transforming the content of the responses, which can really only
happen for HTML documents if I'm following the code correctly.

The bug is "merely" adding "charset=utf-8" unnecessarily, which then changes
the client interpretation of the response data so that it appears corrupt when
it would otherwise have been correctly interpreted as (e.g.) ISO-8859-1.

I think with the latest patch in https://github.com/apache/httpd/pull/409 the
behaviour is not changed for most cases compared to the current 2.4.58 code.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2024-02-06 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

--- Comment #13 from Joseph Heenan  ---
@Joe - your patch looks sensible to me.

I don't have a minimum config; the corruption happened on a pretty lightly
configured apache2 instance on Ubuntu 20.04.6. The important part of the config
is likely to be this:


  ProxyPass http://intranet.example.com/
  ProxyPassReverse http://intranet.example.com/
  ProxyHTMLEnable On
  ProxyHTMLExtended On
  ProxyHTMLURLMap / /intranet/ ce


-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2024-02-06 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

--- Comment #12 from Joe Orton  ---
Rereading this again, maybe Nick was just referring to application/xml which is
wrongly excluded after 1884505. 

I still think we should follow RFC 7303 here rather than matching "xml" at any
point in the content-type string. I have another PR here which does this:

https://github.com/apache/httpd/pull/409/commits/19ed165e19cf142e65715d1b71c68da14f7873c4

I am trying to write a test case to exercise this, but it's not obvious what
minimal configuration forces mod_proxy_html to transform response data (i.e.
corrupting it by trying to transform ISO-8859-1 to UTF-8) - does anybody have a
minimal config?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2023-11-22 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

--- Comment #11 from Joe Orton  ---
A user hit this also confirmed that not loading mod_xml2enc also fixes the
issue.

RFC 7303 "XML Media Types" - https://www.rfc-editor.org/rfc/rfc7303.html

I think the module should follow the rules there: anything which isn't
text/xml, application/xml or contains "+xml" should be ignored.

Nick Kew - any chance you could expand on your comment about what types you
expect to match which aren't?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2020-12-21 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

--- Comment #10 from Joseph Heenan  ---
(In reply to Nick Kew from comment #7)
> Github 150 won't work: it precludes xml doctypes that should be processed.

@Nick Can you please be very explicit and give a few examples of mime types
that should be processed and now aren't processed please?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2020-12-21 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

--- Comment #9 from Joseph Heenan  ---
(In reply to Karlo from comment #8)
> Data point: disabling mod_xml2enc fixes this problem for me.

Disabling mod_xml2enc fixed it for me too.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2020-12-19 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

--- Comment #8 from Karlo  ---
Data point: disabling mod_xml2enc fixes this problem for me.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2020-12-17 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

Nick Kew  changed:

   What|Removed |Added

 Status|NEW |NEEDINFO

--- Comment #7 from Nick Kew  ---
Github 150 won't work: it precludes xml doctypes that should be processed.

Giovanni Bechis's patch looks like a possible candidate for this particular
issue.  Though I'm not sure you need to test both the "application" and the
vnd.openxml.  And shouldn't the test be case-insensitive?

It does, however, open the question of whether *ALL* vnd.openxml types should
be excluded.  Is there no use case for running any of them through a
markup-aware filter in which mod_xml2enc is required for i18n support?

A second question arises here: if mod_xml2enc risks trashing your docs, then
surely so does mod_proxy_html.  Can those who reported or reproduced the
problem tell us what happens if you apply the identical configuration but don't
load mod_xml2enc at all, so mod_proxy_html runs without i18n support? 
mod_proxy_html should remove itself from the filter chain: does your debug
output confirm this?

Marking this NEEDINFO in the hope of feedback on those last two paragrahs.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2020-12-17 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

Joseph Heenan  changed:

   What|Removed |Added

 CC||jos...@heenan.me.uk

--- Comment #6 from Joseph Heenan  ---
Thanks Joe for merging this!

I also hadn't seen Giovanni comment.

I've re-read the code in my patch and it still looks right to me. The comments
in the code also appear to match the behaviour.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2020-12-16 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

--- Comment #5 from Joe Orton  ---
Committed in r1884505, but oh, I missed the comment from Giovanni, sorry.  

"Excel OOXML mime type starts with "application" so it won't match that
condition,"

if the type starts with "application/", then strncmp(ctype, "text/", 5) will be
true, so the change looks right, or am I missing something?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2020-12-13 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

--- Comment #4 from Giovanni Bechis  ---
Created attachment 37603
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=37603=edit
fix for openxml documents

Excel OOXML mime type starts with "application" so it won't match that
condition,
this diff should work for your use-case.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2020-12-02 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

--- Comment #3 from Joseph Heenan  ---
I submitted a possible fix here:

https://github.com/apache/httpd/pull/150

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2020-12-02 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

--- Comment #2 from Joseph Heenan  ---
I've just run into this same problem.

The issue is that the file is being utf8 encoded - notice how the character
0xb1 is being turned into 0xc2 0xb1, which is it's utf8 encoding (
https://www.compart.com/en/unicode/U+00B1 ).

The problem is not the file extension, but the mime type, in my case I was
testing with an Excel OOXML file, which has the content-type:

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Note how this contains 'xml' in it.

The problem seems to actually be xml2enc (which is automatically enabled when
you do 'ProxyHTMLEnable On') - in particular this line of code:

https://github.com/apache/httpd/blob/trunk/modules/filters/mod_xml2enc.c#L347

namely:

/* only act if starts-with "text/" or contains "xml" */
if (strncmp(ctype, "text/", 5) && !strstr(ctype, "xml"))  {

The 'strstr' is matching any content-type that contains xml.

I'm unclear on the original intent of this line. It might make sense to look
for +xml rather than just xml, which would definitely fix this bug.

I appear to have been able to workaround this bug by disabling the xml2enc
module. (I think I don't need it in my use case.)

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2020-04-10 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

--- Comment #1 from Karlo  ---
Was able to cleanly reproduce:

1) New VM "loadbalancer", default centos7 install, yum install httpd
mod_proxy_html, config:

# tail -n 18 conf/httpd.conf 

ProxyHTMLEnable On
ProxyHTMLInterp On
ProxyPreserveHost Off
ProxyPass/ http://10.0.60.188:80/
ProxyPassReverse / http://10.0.60.188:80/
ProxyHTMLURLMap http://10.0.60.188:80/ /
DocumentRoot /defaultdir/

# Supplemental configuration
#
# Load config files in the "/etc/httpd/conf.d" directory, if any.
IncludeOptional conf.d/*.conf

2) Install vm "webserver", on 10.0.60.188, default install (yum install httpd).
Add a OpenOffice generated newdoc.docx file to /var/www/html/newdoc.docx . Make
copy newdoc.XXX [!!]
3) Request files  http://loadbalancer/newdoc.docx 
http://loadbalancer/newdoc.XXX
4) Notice headers are different:


[desktop]$ wget http://10.0.60.189/newdoc.docx
[desktop]$ wget http://10.0.60.189/newdoc.XXX
[desktop]$ xxd newdoc.docx | head -n3
: 504b 0304 1400 0808 0800 c2b1 c29c c28a  PK..
0010: 5000      000b   P...
0020: 005f 7265 6c73 2f2e 7265 6c73 c2ad c292  ._rels/.rels
[desktop]$ xxd newdoc.XXX | head -n3
: 504b 0304 1400 0808 0800 b19c 8a50   PK...P..
0010:      0b00  5f72  .._r
0020: 656c 732f 2e72 656c 73ad 924d 4b03 410c  els/.rels..MK.A.
[desktop]$ file * 
newdoc.docx: Zip archive data, at least v2.0 to extract
newdoc.XXX:  Microsoft Word 2007+

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents

2020-04-10 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339

Karlo  changed:

   What|Removed |Added

 CC||karlo_bzapache@luiten.famil
   ||y

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org