I added log statements to that method as well as its caller InnerTextBuilder::Build <https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/modules/content_extraction/inner_text_builder.cc;l=36;drc=1651676a30cd7abcd177975f7cd0e37bd945f663>. Build() gets the innerText of the HTMLElement it receives, and then iterates through its child_iframes and calls ShouldContentExtractionIncludeIframe on each one. According to the logs the innerText from the argument already contains the iframe text, even before iterating through the iframes.
On Android ShouldContentExtractionIncludeIframe gets called with 3rd party iframes, and it correctly determines that the origins are different so it returns false. On desktop, ShouldContentExtractionIncludeIframe only gets called on about:blank frames. --Salvador On Tue, Nov 26, 2024 at 10:23 AM Dave Tapuska <dtapu...@chromium.org> wrote: > You really need to debug ShouldContentExtractionIncludeIframe > <https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/modules/content_extraction/document_chunker.cc;drc=1651676a30cd7abcd177975f7cd0e37bd945f663;bpv=1;bpt=1;l=61> > > On Tue, Nov 26, 2024 at 1:15 PM 'Salvador Guerrero Ramos' via blink-dev < > blink-dev@chromium.org> wrote: > >> Hi >> >> I've been working on a prototype that uses the Element.innerText >> <https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/editing/element_inner_text.cc;l=466;drc=277f4ab48eb85f7441f78aed191c31068ce89814> >> API to get text from a web page (I'm calling this API with InnerTextAgent >> <https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/modules/content_extraction/inner_text_agent.cc;l=56;drc=277f4ab48eb85f7441f78aed191c31068ce89814>). >> In some web pages the resulting text includes text from cross-origin >> iframes (e.g. embedded tweets). My expectation is that this would work >> similarly to the InnerText JS API >> <https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/innerText>, >> which does not return iframe text. >> >> I'm able to reproduce this on Desktop and Android builds of Chromium, but >> only on certain websites. >> >> Here's a couple of examples: >> https://www.sfgate.com/weather/article/sf-flood-warning-california- >> atmospheric-river-19937062.php >> >> https://www.si.com/nba/celtics/news/celtics-jayson-tatum-reveals-the-simple-reason-boston-took-down-undefeated-cavaliers >> >> Is this the right API to use for this scenario? I'd like to replicate the >> behavior of the JS API. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "blink-dev" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to blink-dev+unsubscr...@chromium.org. >> To view this discussion visit >> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CADBxXZE-fbBq1zRXJq%2Bk57RAVCbom2a%3DNLkgcMKKVss3ifbAhg%40mail.gmail.com >> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CADBxXZE-fbBq1zRXJq%2Bk57RAVCbom2a%3DNLkgcMKKVss3ifbAhg%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "blink-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscr...@chromium.org. To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CADBxXZF45QxHbjjWGSiA_dS%3DTo8QhC8SvCNTvMs59CSJeWu-mA%40mail.gmail.com.