[
https://issues.apache.org/jira/browse/SOLR-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058339#comment-16058339
]
Hoss Man commented on SOLR-10934:
---------------------------------
based on the code in this SO post, it looks like we should be able to..
* loop over all PDAnnotations in each PDPage
* if the annotation isa PDAnnotationLink then we can access it's PDAction and
PDDestination
* PDActionURI is an external link, PDActionGoTo is an inter-document link
* PDActionGoTo can point at either a PDPageDestination (page num?) or a
PDNamedDestination (named anchor?)
* we lookup PDNamedDestination instances in the document catlog.
that _should_ enable us to vet that all inter-document links point to a valid
anchor.
one thing i'm not sure about is if would be possible to check for the "anchor
used more then once in diff adoc files" type problem -- i suspect that the
catalog's list of PDNamedDestination doesn't allow dups, so that info may
already be lost as part of the PDF creation??
> create a link+anchor checker for the ref-guide PDF using PDFBox
> ---------------------------------------------------------------
>
> Key: SOLR-10934
> URL: https://issues.apache.org/jira/browse/SOLR-10934
> Project: Solr
> Issue Type: Sub-task
> Security Level: Public(Default Security Level. Issues are Public)
> Components: documentation
> Reporter: Hoss Man
>
> We currently have CheckLinksAndAnchors.java which is automatically run
> against the ref-guide HTML as part of the build to use JSoup to find bad
> links/anchors that asciidoctor doesn't complain about -- but not everyone
> does/can build the HTML version of the ref-guide sincif we can e it requires
> manually installing jekyll.
> The PDF build only requires things installed by ivy (via JRuby) and we
> already have some PDFBox based code in ReducePDFSize.java that operates on
> this PDF every time it's run -- so if we can find a way to do similar checks
> using the PDFBox API we could catch these broken links faster.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]