[jira] [Comment Edited] (PDFBOX-2530) Improve PDFDebugger

John Hewson (JIRA) Tue, 28 Jul 2015 23:34:18 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645545#comment-14645545
 ]


John Hewson edited comment on PDFBOX-2530 at 7/29/15 6:33 AM:
--------------------------------------------------------------

{quote}
Is there anything that you miss?
{quote}

CMap and ToUnicode, as mentioned above. Maybe also glyph previews.

{quote}
But isn't it more practical to have everything in the font node, so you have 
less keyboard / mouse clicks to find the information?
{quote}

I don't think we'll be able to fit CMap/Encoding, ToUnicode, and CIDToGIDMap 
all on the font node. We already have a tree structure for navigating them, so 
it makes sense to keep them separate. I'd rather keep navigation in the tree 
rather than re-create a new level of navigation inside the view. What might be 
nice on the font node is some sort of overview, but I'm not sure what I'd want 
to see there - maybe tiles with previews of the glyphs?

{quote}
Re "Character" - the method used is toUnicode(), because that one returns one 
(maybe many) unicode characters when text extraction is done.
{quote}

Encoding and ToUnicode need to be kept separate, because they are too often 
confused. We could have both "Character" and "ToUnicode" columns (when present) 
to clarify?


was (Author: jahewson):
{quote}
Is there anything that you miss?
{quote}

CMap and ToUnicode, as mentioned above. Maybe also glyph previews.

{quote}
But isn't it more practical to have everything in the font node, so you have 
less keyboard / mouse clicks to find the information?
{quote}

I don't think we'll be able to fit CMap/Encoding, ToUnicode, and CIDToGIDMap 
all on the font node. We already have a tree structure for navigating them, so 
it makes sense to keep them separate. I'd rather keep navigation in the tree 
rather than re-create a new level of navigation inside the view. What might be 
nice on the font node is some sort of overview, but I'm not sure what I'd want 
to see there - maybe tiles with previews of the glyphs?

{code}
Re "Character" - the method used is toUnicode(), because that one returns one 
(maybe many) unicode characters when text extraction is done.
{code}

Encoding and ToUnicode need to be kept separate, because they are too often 
confused. We could have both "Character" and "ToUnicode" columns (when present) 
to clarify?

> Improve PDFDebugger
> -------------------
>
>                 Key: PDFBOX-2530
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2530
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.8.8, 2.0.0
>            Reporter: Tilman Hausherr
>            Assignee: khyrul bashar
>              Labels: gsoc2015
>         Attachments: 317669-p213.pdf, Avoiding_NPE_for_null_Field_Type.diff, 
> BracketsColorChooser.png, Class_cast_exception_in_page_mode_avoided.diff, 
> Content_stream_showing_issues_patch.diff, 
> Content_stream_showing_issues_patch_updated.diff, DeviceNCS.diff, 
> FlagBitsPane-26-06-2015.diff, Flag_bits_showing_feature-redesigned.diff, 
> Flag_bits_showing_feature.diff, Font_encoding_pane_draft.diff, 
> Font_encoding_pane_feature_.diff, Javadoc_for_ZoomMenu_class.diff, 
> K4SystemFontsNotEmbeded218.pdf, PDFDebugger_StatusBar.png, 
> PDFDebugger_StatusBar_01.png, 
> Parent_dictionary_type_checking_for__f__and__flags.diff, 
> Show_thumbnail_image.diff, SonarQube_issues_resolved.diff, 
> Sonarqube_warning_resolved.diff, Stream_Showing_Feature.diff, 
> Stream_text_showing_for_broken_streams.diff, 
> Zoom_menu_refactored_and_enabled_for_images.diff, filters-screenshot.png, 
> indexedcs.diff, openSelectedPath.diff, parent_node_redirect.diff, 
> parent_node_redirect_expand_disabled.diff, 
> refactor_ZoomMenu_to_avoid_code_redundancy.diff, 
> removed_redundant_codes.patch, separationCS.diff, 
> sonarqube_issue_resolve_26_07.diff, 
> sonarqube_warning_for_method_length_resolved.diff, 
> sonarqube_warning_resolve.diff, tree.diff, treestatus.diff, 
> treestatuspane.diff
>
>
> (This is an idea for the [Google Summer of Code 
> 2015|https://www.google-melange.com/])
> Our command line utility PDFDebugger (part of the command line pdfbox-app get 
> it [here|https://pdfbox.apache.org/downloads.html], read description 
> [here|https://pdfbox.apache.org/commandline/], see the source code 
> [here|https://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFDebugger.java?view=markup&sortby=date])
>  needs some improvements:
>    - hex view
>    - view of non printable characters
>    - ✓ saving streams
>    - binary copy & paste
>    - ✓ Create a status line that shows where we are in the tree. (Like in the 
> Windows REGEDIT)
>    - ✓ Copy the current tree string into the clipboard (useful in discussions 
> about details of a PDF)
>    - ✓ (Optional, not sure if easy) Jump to specific place in the tree by 
> entering tree string
>    - ✓ ability to search in streams (very useful for content streams and meta 
> data)
>    - ✓ show images that are streams
>    - ✓ show PDIndexed color lookup table, show the index value, the base and 
> RGB color value sets when the mouse moves
>    - ✓ show PDSeparation color
>    - ✓ show PDDeviceN colors
>    - optional, idea should be developed a bit: show meaningful explanation on 
> some attributes, e.g. "appearance stream" when hovering over /AP
>    - ✓ show font encodings and characters
>    - ✓ display flag bits (e.g. Annotation flags) in a way that is easy to 
> understand. There are probably others, I assume that the main work needs to 
> be done only once
>    - edit attributes (should be possible to enter values as decimal, hex or 
> binary)
>    - edit streams, while keeping or changing the compression filter
>    - save altered PDF 
>    - ✓ color mark of certain PDF operators, especially Q...q and text 
> operators (BT...ET). Ideally, it should help the user understand the 
> "bracketing" of these operators, i.e. understand where a sequence starts and 
> where it ends. (See "operator summary" in the PDF Spec) Other "important" 
> operators I can think of are the matrix, font and color operators. A cool 
> advanced thing would be to show the current color or the font in a popup when 
> hovering above such an operator.
> To see a product with a similar purpose that is better than PDFDebugger, 
> watch [this video|https://www.youtube.com/watch?v=g-QcU9B4qMc].
> I'm not asking to implement a clone of that product (I don't use it, all I 
> know is that video), but we at PDFBox really need something that makes PDF 
> debugging easier. As an example of how the current PDFDebugger prevented me 
> from finding a bug quickly, see PDFBOX-2401 and search for "PDFDebugger".
> Prerequisites:
> - java programming, especially the GUI components
> - the ability to understand existing source code
> Using external software components is possible (must have Apache License or a 
> compatible one), but should be decided on a case-by-case basis, we don't want 
> to get too big.
> Development strategy: go from the easy to the difficult. The wished features 
> are already sorted this way (mostly).
> Get introduced: [download the source code with 
> svn|https://pdfbox.apache.org/downloads.html#scm] and build it with maven. 
> Run PDFDebugger and view some PDFs to see the components of a PDF. Start with 
> the file of PDFBOX-2401. Read up something about the structure of PDF on the 
> web or from the [PDF 
> Specification|https://www.adobe.com/devnet/pdf/pdf_reference.html].
> Mentor: Tilman Hausherr (European timezone, languages: german, english, 
> french). To see the GSoC2014 project I mentored, go to PDFBOX-1915.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (PDFBOX-2530) Improve PDFDebugger

Reply via email to