Re: XMLParsedAsHTMLWarning during import of ofx from ofxget

Red S Sun, 05 Jun 2022 22:33:24 -0700

Hmm, I haven't come across this issue so far.

It's the ofxparse library <https://github.com/jseutter/ofxparse> that uses 
BS4. I'd ask there. Indeed, they did decide 
<https://github.com/jseutter/ofxparse/pull/108> to parse this as HTML even 
though it's XML, but that code has worked fine for years now. What platform 
are you using?


I'd also consider filtering out via the shell, if everything else works 
fine:
bean-extract [blah blah...] 2> >(grep -v XMLParsedAsHTMLWarning >&2)


On Sunday, June 5, 2022 at 6:10:35 PM UTC-7 coltonc...@gmail.com wrote:

> Hey all,
>
> I'm getting the following warning:
> venv/lib/python3.10/site-packages/bs4/builder/__init__.py:545: 
> XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using 
> an HTML parser. If this really is an HTML document (maybe it's XHTML?), you 
> can ignore or filter this warning. If it's XML, you should know that using 
> an XML parser will be more reliable. To parse this document as XML, make 
> sure you have the lxml package installed, and pass the keyword argument 
> `features="xml"` into the BeautifulSoup constructor.
>   warnings.warn(
>
> What I'm doing to get this:
>
>    - Downloading account data using ofxget as described here 
>    <https://reds-rants.netlify.app/personal-finance/direct-downloads/>
>    - Importing that data using beancount-reds-importer (e.g. here 
>    
> <https://github.com/redstreet/beancount_reds_importers/blob/main/beancount_reds_importers/chase/__init__.py>
>    )
>
> Things I've tried or discovered:
>
>    - I looked for all instances of `soup = BeautifulSoup .. ` and found 
>    the main calls in ofx.py. I tried changing these calls from feature=lxml 
> to 
>    feature=xml which didn't resolve warning
>    - I made sure lxml is downloaded
>    - I tried to suppress the warning with a warning.filterwarnings but 
>    that didn't work either (not sure it would be the "right" thing either)
>    - I found a PR in an unrelated repo where they solved by suppressing 
>    here <https://github.com/EnergieID/entsoe-py/issues/180>
>    - I tried ofx data downloaded from both Fidelity Investments and Chase 
>    (not expecting this to be institution specific)
>
> Questions I have:
>
>    - The warning doesn't really help me understand what call into 
>    BeautifulSoup caused the warning. Any tips on how to track down where the 
>    issue is coming from? Maybe ofx.py isn't part of the issue at all
>    - I think bean_extract is still working but any suggestions on if the 
>    warning should be ignored or resolved would also be appreciated
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to beancount+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/251cbe25-99ea-4dac-92e5-6b2e0c7f128cn%40googlegroups.com.

Re: XMLParsedAsHTMLWarning during import of ofx from ofxget

Reply via email to