[ 
https://issues.apache.org/jira/browse/PDFBOX-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17898511#comment-17898511
 ] 

Tilman Hausherr commented on PDFBOX-5901:
-----------------------------------------

Extraction does work, here's what I get with page 24 if I comment out the 
logging:
{noformat}
2024营销趋势洞察24
监控和管理您的品牌形象。
您想实时发现新的消费趋势吗?
我们最先进的视觉人工智能技术可帮助营销人员搜索和监控图像:
        . 标志和名人检测
        . 人脸检测(年龄、性别、密度)
        . 场景和物体检测
        . 情绪检测
        . 模因检测
        . 物体字符识别
        . 图像聚类
另外,您可以在我们的网红营销平台 Klear 中上传图像,以发现创建
类似内容的网红。
了解更多
{noformat}
The real problem is that the logging makes it very slow. I'll look for a way to 
reduce this.

> there is an issue with font mapping or rendering
> ------------------------------------------------
>
>                 Key: PDFBOX-5901
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5901
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 2.0.31
>            Reporter: ltzzZ
>            Priority: Major
>         Attachments: PDFBOX-5901-p24.pdf, image-2024-11-15-12-38-12-100.png, 
> image-2024-11-15-12-38-36-179.png, image-2024-11-15-12-39-22-585.png
>
>
> When I try to extract the text content of a pdf file, I keep looping through 
> the warning log of font rendering or mapping, I can't get the content of the 
> file, how can I fix this problem.
>  
> My code:
>   !image-2024-11-15-12-38-36-179.png!
> problem:
>   !image-2024-11-15-12-39-22-585.png!
> and sometimes the CPU usage is abnormal
>   !image-2024-11-15-12-38-12-100.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to