[ 
https://issues.apache.org/jira/browse/PDFBOX-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18068213#comment-18068213
 ] 

Maruan Sahyoun edited comment on PDFBOX-6178 at 3/25/26 8:22 AM:
-----------------------------------------------------------------

{quote}
Beginning with PDF 1.2 a name object is an atomic symbol uniquely defined by a 
sequence of any
characters (8-bit values) except null (character code 0). Uniquely defined 
means that any two name
objects that, after all escaping is expanded (see below), and the resulting 
sequences of bytes are not an
exact binary match denote different objects.
{quote}

... after all escaping is expanded ... matters here IMHO. COSName#getName in 
this regards is a convenience method to allow for String comparison and String 
representation where needed. Of course if preferred for trunk we could change 
that and return the escaped form of the byte[] such as COSName#writePDF would 
do. Or we add another method #asEscapedString and use that in PDFDebugger in 
order to limit the changes necessary in places where an unescaped String 
representation is needed e.g. font names, Separation, DeviceN ...   

Having said that it might be useful for debugging purposes but after reading 
the unescaped form matters - at least that's how I interpret the spec. 
 


was (Author: msahyoun):
{quote}
Beginning with PDF 1.2 a name object is an atomic symbol uniquely defined by a 
sequence of any
characters (8-bit values) except null (character code 0). Uniquely defined 
means that any two name
objects that, after all escaping is expanded (see below), and the resulting 
sequences of bytes are not an
exact binary match denote different objects.
{quote}

... after all escaping is expanded ... matters here IMHO. COSName#getName in 
this regards is a convenience method to allow for String comparison and String 
representation where needed. Of course if preferred for trunk we could change 
that and return the escaped form of the byte[] such as COSName#writePDF would 
do. Or we add another method #asEscapedString and use that in PDFDebugger in 
order to limit the changes necessary in places where an unescaped String 
representation is needed e.g. font names, Separation, DeviceN ...   


 

> PdfBox renames RadioButton with Umlaut
> --------------------------------------
>
>                 Key: PDFBOX-6178
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6178
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm
>    Affects Versions: 2.0.36, 3.0.5 PDFBox, 3.0.6 PDFBox, 3.0.7 PDFBox
>            Reporter: Maruan Sahyoun
>            Assignee: Maruan Sahyoun
>            Priority: Major
>             Fix For: 2.0.37, 3.0.8 PDFBox, 4.0.0
>
>         Attachments: form_empty.pdf, form_selected_ASCII_NUL_acrobat.pdf, 
> form_selected_acrobat_pro.pdf, form_selected_pdfbox.pdf, 
> form_selected_pdfbox_patched.pdf
>
>
> From the users mailing list:
> 1. Create a document that contains a radio button with Umlaut in name. I can 
> give you an example document.
> Let's say: A radio group "Geschlecht" with the buttons "männlich" and 
> "weiblich".
> Do not use PdfBox for this step. I used Acrobat Pro 2020.
> The name/value of the "männlich" button is encoded as "/m#e4nnlich" in the 
> PDF.
> 2. Update the value of the radio group with PdfBox to "männlich" and save it 
> to a new document.
> {code}
> import java.io.File;
> import org.apache.pdfbox.Loader;
> import org.apache.pdfbox.pdmodel.PDDocument;
> public class UpdateRadioGroup {
> private static final String INPUT_FILE = "form_empty.pdf";
> private static final String OUTPUT_FILE = "form_selected.pdf";
> private static final String FIELD_NAME = "Geschlecht";
> private static final String FIELD_VALUE = "männlich";
> public static void main(String[] args) throws Exception {
>          try (PDDocument document = Loader.loadPDF(new File(INPUT_FILE))) {
>              document.getDocumentCatalog()
>                      .getAcroForm(null)
>                      .getField(FIELD_NAME)
>                      .setValue(FIELD_VALUE);
>              document.save(new File(OUTPUT_FILE));
>          }
>      }
>  }
> {code}
> 3. Validate the name/value of the "männlich" button in the new document in a 
> text editor. PdfBox encodes "männlich" to "/m#c3#a4nnlich" (see 
> COSName.writePDF() ).
> The Problem
>  ===============
>  PdfBox renames the radio button from "männlich" to "männlich".  Or
>  "/m#e4nnlich" to "/m#c3#a4nnlich" in PDF-format.
>  When you read the document again, PdfBox converts "#c3#a" to "ä" but
>  all other programs do not. I tested Acrobat Pro 2020, actual Acrobat
>  Reader, PDFXplorer from https://www.o2sol.com



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to