[ 
https://issues.apache.org/jira/browse/PDFBOX-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18067575#comment-18067575
 ] 

Maruan Sahyoun commented on PDFBOX-6178:
----------------------------------------

the compareTo() implementation has a potential flaw compared to the 3.0 
implementation. It's possible that two different byte[] end with the same 
String representation. But you're right it's not worse than the current version 
in 2.0. What about below code instead of the String comparison?
{code:java}
        int len = Math.min(nameBytes.length, other.nameBytes.length);
        for (int i = 0; i < len; i++)
        {
            int diff = (nameBytes[i] & 0xFF) - (other.nameBytes[i] & 0xFF);
            if (diff != 0)
            {
                return diff;
            }
        }
        return nameBytes.length - other.nameBytes.length;
{code}

> PdfBox renames RadioButton with Umlaut
> --------------------------------------
>
>                 Key: PDFBOX-6178
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6178
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm
>    Affects Versions: 2.0.36, 3.0.5 PDFBox, 3.0.6 PDFBox, 3.0.7 PDFBox
>            Reporter: Maruan Sahyoun
>            Assignee: Maruan Sahyoun
>            Priority: Major
>             Fix For: 2.0.37, 3.0.8 PDFBox, 4.0.0
>
>         Attachments: form_empty.pdf, form_selected_ASCII_NUL_acrobat.pdf, 
> form_selected_acrobat_pro.pdf, form_selected_pdfbox.pdf
>
>
> From the users mailing list:
> 1. Create a document that contains a radio button with Umlaut in name. I can 
> give you an example document.
> Let's say: A radio group "Geschlecht" with the buttons "männlich" and 
> "weiblich".
> Do not use PdfBox for this step. I used Acrobat Pro 2020.
> The name/value of the "männlich" button is encoded as "/m#e4nnlich" in the 
> PDF.
> 2. Update the value of the radio group with PdfBox to "männlich" and save it 
> to a new document.
> {code}
> import java.io.File;
> import org.apache.pdfbox.Loader;
> import org.apache.pdfbox.pdmodel.PDDocument;
> public class UpdateRadioGroup {
> private static final String INPUT_FILE = "form_empty.pdf";
> private static final String OUTPUT_FILE = "form_selected.pdf";
> private static final String FIELD_NAME = "Geschlecht";
> private static final String FIELD_VALUE = "männlich";
> public static void main(String[] args) throws Exception {
>          try (PDDocument document = Loader.loadPDF(new File(INPUT_FILE))) {
>              document.getDocumentCatalog()
>                      .getAcroForm(null)
>                      .getField(FIELD_NAME)
>                      .setValue(FIELD_VALUE);
>              document.save(new File(OUTPUT_FILE));
>          }
>      }
>  }
> {code}
> 3. Validate the name/value of the "männlich" button in the new document in a 
> text editor. PdfBox encodes "männlich" to "/m#c3#a4nnlich" (see 
> COSName.writePDF() ).
> The Problem
>  ===============
>  PdfBox renames the radio button from "männlich" to "männlich".  Or
>  "/m#e4nnlich" to "/m#c3#a4nnlich" in PDF-format.
>  When you read the document again, PdfBox converts "#c3#a" to "ä" but
>  all other programs do not. I tested Acrobat Pro 2020, actual Acrobat
>  Reader, PDFXplorer from https://www.o2sol.com



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to